Sponsors

  • 10gen
  • DataStax, Inc.
  • Dell
  • Google
  • Lexis Nexis
  • Oracle
  • VMware
  • Percona

Sponsorship Opportunities

For information on exhibition and sponsorship opportunities at the convention, contact Sharon Cordesse at scordesse@oreilly.com

Download the OSCON Data Sponsor/Exhibitor Prospectus

Media Partner Opportunities

For information on trade opportunities with O'Reilly conferences or contact mediapartners@ oreilly.com

Press and Media

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

OSCON Bulletin

To stay abreast of convention news and announcements, please sign up for the OSCON email bulletin (login required)

Contact Us

View a complete list of OSCON contacts

HBase and Hive at StumbleUpon

Jean-Daniel Cryans (Cloudera)
Average rating: ****.
(4.00, 4 ratings)

We deployed Hive at StumbleUpon early this year as a tool for mining our HBase production datasets. It has been quite a success with both engineering and our analysts; engineers no longer have to write the analysts’ reports and the analysts don’t have to deal with cranky engineers.

In this presentation, we will first cover the reasons why someone would use Hive with HBase instead of directly using HDFS files, and which goals can be accomplished. We will then review how the Hive-HBase integration works to better understand the state and drawbacks of the current implementation.

The second part will cover how we deployed Hive internally at StumbleUpon and how the data is fed into the system. This will include how we are live replicating the data from our MySQL and real-time HBase clusters into an analytical Hadoop/HBase cluster in a ETL fashion. We will also present some of our use cases and how they translate into the Hive query language.

The presentation will end with our lessons learned and how we expect to grow our Hive usage as the company does. At the time of writing we are signing up more than 600,000 new users per month and we just passed 15M total users.

Photo of Jean-Daniel Cryans

Jean-Daniel Cryans

Cloudera

Jean-Daniel is a Database Engineer at StumbleUpon. When he’s not developing HBase or supporting its usage inside the company, he’s helping others with the Hadoop stack. Jean-Daniel has been a commiter on the Apache HBase project since 2008.

Comments on this page are now closed.

Comments

Picture of Sheeri K. Cabral
Sheeri K. Cabral
09/05/2011 5:45am PDT

Video for this talk can be found online at www.youtube.com/watch?v=WpQ...