Using Hadoop for Big Data Analysis

Mike Olson (Cloudera)
Apache, Cloud Computing, Databases, Programming
Location: Exhibit Hall 3
Average rating: **...
(2.82, 17 ratings)

“Big data” problems are increasingly common. Storage and compute services are inexpensive and easy to get in the cloud. New data sources — sensor readings, video and still imagery, audio, telemetry from software systems and devices, web logs, and data from the biological and physical sciences are exploding, and it is now possible to store them on-line cheaply.

Apache Hadoop is a powerful tool for analyzing these new, very diverse, repostories. Hadoop can scan and analyze petabytes of data using a collection of commodity servers working in parallel.

Hadoop enforces a different programming paradigm from relational databases, large-scale data warehouses and special-purpose high-performance computing systems. Hadoop uses shared-nothing parallel execution to break large data processing tasks into small pieces that get distributed among many servers. Those tasks can include code written by the user that operates on complex data in its native format. Hadoop relies on a high-performance distributed file system, HDFS, for data storage and replication.

In this talk, I’ll explain what Hadoop and related open source projects do, how they operate, and how they are used in real-world workloads to answer questions that simply can’t be posed using other systems.

Photo of Mike Olson

Mike Olson

Cloudera

Mike Olson is the CEO of Cloudera, which offers support and services for Hadoop. He was an early architect of the Postgres database system and has worked as an engineer, manager and executive at a number of database companies, including Britton Lee, Illustra, Informix and Oracle. He was CEO of Sleepycat Software, makers of Berkeley DB, through the company’s acquisition by Oracle.

Comments on this page are now closed.

Comments

Dave Brondsema
07/22/2009 5:06pm PDT

Talked primarily about types of problems his consultancy clients were working with. Not about Hadoop itself.

  • Intel
  • Microsoft
  • Google
  • SourceForge.net
  • Sun Microsystems
  • Facebook
  • Gear6
  • Kaltura
  • Liferay
  • MindTouch
  • MySpace.com
  • Novell, Inc.
  • Open Invention Network
  • Rackspace Cloud
  • Schooner Information Technology
  • Silicon Mechanics
  • Symbian Foundation
  • Twilio
  • WSO2
  • Yabarana Corporation

Sponsorship Opportunities

For information on exhibition and sponsorship opportunities at the conference, contact Sharon Cordesse at scordesse@oreilly.com

Download the OSCON Sponsor/Exhibitor Prospectus

Media Partner Opportunities

Download the Media & Promotional Partner Brochure (PDF) for information on trade opportunities with O'Reilly conferences or contact mediapartners@ oreilly.com

Press and Media

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

OSCON Newsletter

To stay abreast of conference news and to receive email notification when registration opens, please sign up for the OSCON newsletter (login required)

Contact Us

View a complete list of OSCON contacts