An introduction to Hive and Hadoop
Getting data into Hive
Learn how to create tables in Hive, using the appropriate table properties and data types. Then we will cover loading data into Hive from files on a local file system, files in HDFS or data in an RDBMS.
The Hive Query Language is an SQL-like language for querying data in Hadoop. It supports a subset of SQL-92 features, but also adds Hive/Hadoop-specific enhancements.
This section will explain how Hive converts queries into MapReduce code that is executed on the Hadoop cluster.
Partitioning and Bucketing
A powerful feature of Hive is the ability to partition and bucket your data. Partitioning is a way of organizing large data sets into distinct subsets. Bucketing is useful for sampling a fraction of data.
These are specific recommendations for configuring Hive, handling data properly, and designing for performance.
Aaron Kimball is a software engineer at Cloudera, Inc., the Commercial Hadoop company. Aaron is the principle developer of Sqoop, the SQL-to-Hadoop database import/export tool. Aaron has been working with Hadoop since early 2007, and contributes actively to its development. Through Cloudera, he additionally provides training to developers and system administrators working with Hadoop. Aaron holds a B.S. in Computer Science from Cornell University, and an M.S. in Computer Science and Engineering from the University of Washington.
For information on exhibition and sponsorship opportunities at the conference, contact Sharon Cordesse at email@example.com
Download the OSCON Sponsor/Exhibitor Prospectus
For media-related inquiries, contact Maureen Jennings at firstname.lastname@example.org
To stay abreast of conference news and to receive email notification when registration opens, please sign up for the OSCON Newsletter (login required)
Have an idea for OSCON to share? email@example.com
View a complete list of OSCON contacts