Introduction to Apache Hadoop

Tom Wheeler (Cloudera, Inc.)
Data
Location: Portland 255
Average rating: ****.
(4.06, 48 ratings)
Slides:   1-PDF 

THIS TUTORIAL HAS REQUIREMENTS AND INSTRUCTIONS LISTED BELOW

This tutorial will present a mix of lecture and instructor-led demonstrations to explain what Apache Hadoop is and why it’s becoming a standard for large-scale data storage and processing.

  • Why the World Needs Hadoop
    • What is Apache Hadoop?
    • How Did Apache Hadoop Originate?
    • The Economics of Hadoop
    • Common Use Cases
  • Fundamental Concepts
    • How Hadoop Differs from other Distributed Computing Architectures
    • High-Level Architecture
    • The Anatomy of the Cluster
  • HDFS: The Hadoop Distributed Filesystem
    • Comparison to Standard Filesystems
    • HDFS Replication and Reliability
    • Demo: Accessing HDFS Using the Command Line
  • MapReduce
    • Data Processing with MapReduce
    • Thinking in MapReduce
    • Hadoop Streaming
    • Demo: MapReduce Example in Python
    • Visual Overview of Job Execution
    • Hadoop’s Java API for MapReduce
    • Demo: MapReduce Example in Java
  • Using Apache Hadoop Effectively
    • Partitioning the Keyspace
    • Improving Performance with a Combiner
    • Tips for Running at Scale
    • When Hadoop is Not the Right Choice
  • The Hadoop Ecosystem
    • Apache Flume
    • Apache Sqoop
    • Apache Hive
    • Apache Pig
    • Apache HBase
    • Apache Mahout
    • Hadoop Versions and Distributions

This is a practical session focused on real-world applications of Apache Hadoop—at no point will I use the lame “wordcount” example that’s become cliché for explaining MapReduce to beginners.

TUTORIAL REQUIREMENTS AND INSTRUCTIONS FOR ATTENDEES

If you’d like to follow along with the instructor-led demos of HDFS and MapReduce, please follow the instructions on this page to get the virtual machine and code samples.

QUESTIONS for the speaker?: Use the “Leave a Comment or Question” section at the bottom to address them.

Photo of Tom Wheeler

Tom Wheeler

Cloudera, Inc.

Tom Wheeler’s career spans more than fifteen years in the communications, biotech, financial, healthcare, aerospace and defense industries. Before joining Cloudera, he developed engineering software at Boeing, helped to design a high-volume data processing system for WebMD and served as senior programmer/analyst for a brokerage firm. Mr. Wheeler is a frequent presenter at both user groups and software conferences.

Comments on this page are now closed.

Comments

Picture of Tom Wheeler
07/22/2013 12:15pm PDT

Vivek: I burned a backup copy to a DVD. If you show up to the session a few minutes early, I will let you have it.

07/22/2013 12:04pm PDT

Tom – Would you have the VM available on a USB stick. The download is 2.4G and will take hours at the interweb speed here at the convention center.

Picture of Tom Wheeler
06/06/2013 12:34pm PDT

I apologize for this and have updated the link to point to valid page. I intend to post the VM and examples on that page by July 1, three weeks prior to the workshop.

06/06/2013 12:21pm PDT

The VM link is broken.

Sponsors

Sponsorship Opportunities

For information on exhibition and sponsorship opportunities at the conference, contact Sharon Cordesse at (707) 827-7065 or scordesse@oreilly.com.

Contact Us

View a complete list of OSCON contacts