Building a Cloud Culture at Yelp

Jim Blomo (Yelp)
Cloud
Location: F150
Average rating: ***..
(3.83, 6 ratings)

This talk highlights the areas, both cultural and technological, where Yelp has changed to best take advantage of new cloud products. The themes I’ll cover are:

  • History: Yelp’s history as 100% hosted, with Hadoop experiments on spare machines
  • Problems with hosted model: Unreliable data processing, and extensive coordination around feature launches
  • Usage Today: Mixed usage, including 7+ TB hosted databases, 250+ GB compressed logs /day in S3, dozens of EMR jobs per day
  • Company Progress: 40 -> 80 Million monthly visitors with 100s of new features across several mediums (website, native mobile apps, mobile website)
  • How did we get here? Big wins with EMR using open sourced libraries; policies around development, privacy, and testing
  • Yelp Features Supported by EMR: Search Relevance, Usage graphs, Review Highlights, Spam Filtering, Advertising Optimizations
  • Open source tools: https://github.com/Yelp, mrjob, EMRio, Tron, s3mysqldump
  • Lessons: Hardest part was not technology adoption, but integration into existing workflows and policies. Shared understanding of resources available is critical.
Photo of Jim Blomo

Jim Blomo

Yelp

Jim Blomo (@jimblomo) is passionate about putting data to work by developing robust, elegant systems. At Yelp, he manages a growing data mining team that uses Hadoop, mrjob, and oddjob to process TBs of data. Before Yelp, he built infrastructure for startups and Amazon.

Jim also lectures at UC Berkeley’s School of information on Data Mining and Web Architecture and has presented at conferences such as AWS re:Invent and Wolfram Alpha Data Summit.

Sponsors

Sponsorship Opportunities

For information on exhibition and sponsorship opportunities at the conference, contact Sharon Cordesse at (707) 827-7065 or scordesse@oreilly.com.

Contact Us

View a complete list of OSCON contacts