Introduction to Hadoop (the how)

Aaron Kimball (Cloudera, Inc.)
Hadoop
Location: E141/E142
Tags: cloud, hadoop
Please note: to attend, your registration must include Tutorials.
Average rating: ***..
(3.38, 16 ratings)

The Hadoop MapReduce API

Learn how to get started writing programs against Hadoop’s API.

Introduction to MapReduce Algorithms

Writing programs for MapReduce requires analyzing problems in a new way. This lecture shows how some
common functions can be expressed as part of a MapReduce pipeline.

Debugging MapReduce programs

Debugging in the distributed environment is challenging. This lecture will expose you to best practices for program
design to mitigate debugging challenges, as well as local testing tools and techniques for debugging at scale.

Optimizing MapReduce Programs

We’ll use the Cloudera Training VM to work through an example where you write a MapReduce program and
improve its performance using techniques explored earlier.

NOTE: Attendees should download the Cloudera Training vm from http://cloudera.com/hadoop-training-virtual-machine. VMWare Player (windows, linux) or VMWare Fusion (OS X) will be required in order to use it.

Photo of Aaron Kimball

Aaron Kimball

Cloudera, Inc.

Aaron Kimball is a software engineer at Cloudera, Inc., the Commercial Hadoop company. Aaron is the principle developer of Sqoop, the SQL-to-Hadoop database import/export tool. Aaron has been working with Hadoop since early 2007, and contributes actively to its development. Through Cloudera, he additionally provides training to developers and system administrators working with Hadoop. Aaron holds a B.S. in Computer Science from Cornell University, and an M.S. in Computer Science and Engineering from the University of Washington.

Comments on this page are now closed.

Comments

Joshua Anyan
07/20/2010 10:08am PDT

Two most useful explanations: 1) MapRed algorithms 2) Debugging MapRed jobs.

Great presentation.

Ryan Boyer
07/19/2010 10:29pm PDT

I really enjoyed the explanation of the different map reduce algorithms (ie using map/reduce for joins).

  • Intel
  • Microsoft
  • Google
  • Facebook
  • Rackspace Hosting
  • (mt) Media Temple, Inc.
  • ActiveState
  • CommonPlaces
  • DB Relay
  • FireHost
  • GoDaddy
  • HP
  • HTSQL by Prometheus Research
  • Impetus Technologies Inc.
  • Infobright, Inc
  • JasperSoft
  • Kaltura
  • Marvell
  • Mashery
  • NorthScale, Inc.
  • Open Invention Network
  • OpSource
  • Oracle
  • Parallels
  • PayPal
  • Percona
  • Qualcomm Innovation Center, Inc.
  • Rhomobile
  • Schooner Information Technology
  • Silicon Mechanics
  • SourceGear
  • Symbian
  • VoltDB
  • WSO2
  • Linux Pro Magazine

Sponsorship Opportunities

For information on exhibition and sponsorship opportunities at the conference, contact Sharon Cordesse at scordesse@oreilly.com

Download the OSCON Sponsor/Exhibitor Prospectus

Media Partner Opportunities

Download the Media & Promotional Partner Brochure (PDF) for information on trade opportunities with O'Reilly conferences or contact mediapartners@ oreilly.com

Press and Media

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

OSCON Newsletter

To stay abreast of conference news and to receive email notification when registration opens, please sign up for the OSCON Newsletter (login required)

OSCON 2.0 Ideas

Have an idea for OSCON to share? oscon-idea@oreilly.com

Contact Us

View a complete list of OSCON contacts