For information on exhibition and sponsorship opportunities at the convention, contact Sharon Cordesse at email@example.com
Download the OSCON Data Sponsor/Exhibitor Prospectus
For information on trade opportunities with O'Reilly conferences or contact mediapartners@ oreilly.com
For media-related inquiries, contact Maureen Jennings at firstname.lastname@example.org
To stay abreast of convention news and announcements, please sign up for the OSCON email bulletin (login required)
View a complete list of OSCON contacts
We deployed Hive at StumbleUpon early this year as a tool for mining our HBase production datasets. It has been quite a success with both engineering and our analysts; engineers no longer have to write the analysts’ reports and the analysts don’t have to deal with cranky engineers.
In this presentation, we will first cover the reasons why someone would use Hive with HBase instead of directly using HDFS files, and which goals can be accomplished. We will then review how the Hive-HBase integration works to better understand the state and drawbacks of the current implementation.
The second part will cover how we deployed Hive internally at StumbleUpon and how the data is fed into the system. This will include how we are live replicating the data from our MySQL and real-time HBase clusters into an analytical Hadoop/HBase cluster in a ETL fashion. We will also present some of our use cases and how they translate into the Hive query language.
The presentation will end with our lessons learned and how we expect to grow our Hive usage as the company does. At the time of writing we are signing up more than 600,000 new users per month and we just passed 15M total users.
Jean-Daniel is a Database Engineer at StumbleUpon. When he’s not developing HBase or supporting its usage inside the company, he’s helping others with the Hadoop stack. Jean-Daniel has been a commiter on the Apache HBase project since 2008.
Comments on this page are now closed.