Sponsors

  • 10gen
  • DataStax, Inc.
  • Dell
  • Google
  • Lexis Nexis
  • Oracle
  • VMware
  • Percona

Sponsorship Opportunities

For information on exhibition and sponsorship opportunities at the convention, contact Sharon Cordesse at scordesse@oreilly.com

Download the OSCON Data Sponsor/Exhibitor Prospectus

Media Partner Opportunities

For information on trade opportunities with O'Reilly conferences or contact mediapartners@ oreilly.com

Press and Media

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

OSCON Bulletin

To stay abreast of convention news and announcements, please sign up for the OSCON email bulletin (login required)

Contact Us

View a complete list of OSCON contacts

OSCON: Data 2011 Schedule

Below are the confirmed and scheduled talks (schedule subject to change).

Customize Your Own Schedule

Create your own OSCON: Data schedule using the personal scheduler function. Mark the keynotes, workshops, sessions, and events you want to attend by clicking on the calendar icon [calendar icon] next to each listing. Then click on "personal schedule" below and get your own customized schedule generated.

B118-119
Add NoSQL @ Netflix to your personal schedule
10:40am NoSQL @ Netflix Siddharth Anand (LinkedIn)
Add Building Web Applications with MongoDB to your personal schedule
1:30pm Building Web Applications with MongoDB Roger Bodamer (10gen)
Add Redis: CS101 Data Structures via the Network to your personal schedule
2:20pm Redis: CS101 Data Structures via the Network Ezra Zygmuntowicz (VMware Inc)
Add Whirr: Open Source Cloud Services to your personal schedule
3:30pm Whirr: Open Source Cloud Services Tom White (Cloudera)
C121/122
Add MySQL Replication Update to your personal schedule
10:40am MySQL Replication Update Lars Thalmann (Oracle)
Add HandlerSocket: NoSQL via MySQL to your personal schedule
11:30am HandlerSocket: NoSQL via MySQL Ryan Lowe (Percona), Haidong Ji (Percona)
Add Ephemeral Hadoop Clusters in the Cloud to your personal schedule
1:30pm Ephemeral Hadoop Clusters in the Cloud Greg Fodor (Etsy)
Add MVCC Unmasked to your personal schedule
2:20pm MVCC Unmasked Bruce Momjian (EnterpriseDB)
Add MySQL for the Large Scale Social Games to your personal schedule
3:30pm MySQL for the Large Scale Social Games Yoshinori Matsunobu (DeNA)
Add InnoDB: Performance and Scalability Features to your personal schedule
4:20pm InnoDB: Performance and Scalability Features Inaam Rana (Oracle), Calvin Sun (Twitter)
C125/126
10:40am TBC
C123
Add Introduction to Hadoop to your personal schedule
10:40am Introduction to Hadoop Tom Hanlon (Cloudera)
Add Architectural Anti-patterns for Data Handling to your personal schedule
11:30am Architectural Anti-patterns for Data Handling Gleicon Moraes (7co.cc)
Add What Every Data Programmer Needs to Know About Disks to your personal schedule
1:30pm What Every Data Programmer Needs to Know About Disks Ted Dziuba (eBay Local/Milo.com)
Add Esperwhispering: get your real-time data game on to your personal schedule
2:20pm Esperwhispering: get your real-time data game on Theo Schlossnagle (OmniTI/Circonus)
Add Distributed Data Analysis with Hadoop and R to your personal schedule
3:30pm Distributed Data Analysis with Hadoop and R Jonathan Seidman (Orbitz Worldwide), Ramesh Venkataramaiah (Orbitz Worldwide)
Add QYZ: LaTeX, R and Redis for Beautiful Analytics  to your personal schedule
4:20pm QYZ: LaTeX, R and Redis for Beautiful Analytics Noah Pepper (Lucky Sort), Homer Strong (Lucky Sort)
C124
Add Playful Explorations of Public and Personal Data to your personal schedule
10:40am Playful Explorations of Public and Personal Data Andrew Turner (GeoIQ)
Add Developing and Deploying Hadoop Security to your personal schedule
11:30am Developing and Deploying Hadoop Security Owen O'Malley (HortonWorks)
Add OpenTSDB: A Scalable, Distributed Time Series Database to your personal schedule
1:30pm OpenTSDB: A Scalable, Distributed Time Series Database Benoit Sigoure (StumbleUpon, Inc.)
Add YARN - Next Generation Hadoop Map-Reduce to your personal schedule
2:20pm YARN - Next Generation Hadoop Map-Reduce Arun Murthy (Hortonworks Inc.)
Add Real-time Streaming Analysis for Hadoop and Flume to your personal schedule
3:30pm Real-time Streaming Analysis for Hadoop and Flume Aaron Kimball (Magnify Consulting)
Add Querying Riak Just Got Easier - Introducing Secondary Indices to your personal schedule
4:20pm Querying Riak Just Got Easier - Introducing Secondary Indices Rusty Klophaus (Basho Technologies)
Add Welcome to your personal schedule
9:00am Plenary
Room: Oregon Ballroom 203/204
Welcome Sarah Novotny (NGINX), Bradford Stephens (Drawn to Scale)
Add Finding the Perfect Match to your personal schedule
9:05am Plenary
Room: Oregon Ballroom 203/204
Finding the Perfect Match Tom Quisel (OkCupid)
Add Benjamin Black to your personal schedule
9:20am Plenary
Room: Oregon Ballroom 203/204
Benjamin Black Benjamin Black (Boundary)
Add What Would You Do With Your Own Google? to your personal schedule
9:40am Plenary
Room: Oregon Ballroom 203/204
What Would You Do With Your Own Google? Steve Yegge (Google)
Add Q & A to your personal schedule
10:00am Plenary
Room: Oregon Ballroom 203/204
Q & A
Add Ignite OSCON to your personal schedule
7:00pm Event
Room: Oregon Ballroom
Ignite OSCON
10:10am Morning Break
Room: Exhibit Hall C
12:10pm Lunch
Room: Exhibit Hall C
3:00pm Afternoon Break
Room: Exhibit Hall C
Add Monday Birds of a Feather Sessions to your personal schedule
9:00pm Plenary
Room: See BoF Schedule for Locations
Monday Birds of a Feather Sessions
Add Android Happy Hour to your personal schedule
5:00pm Event
Room: Gather (Double Tree Hotel bar)
Android Happy Hour
10:40am-11:20am (40m) Data: NoSQL Databases
NoSQL @ Netflix
Siddharth Anand (LinkedIn)
Over the past few years, Netflix has migrated to the cloud. This talk details Netflix's transition away from relational databases and towards high-availability (NoSQL) storage systems. We rely on a combination of proprietary (e.g. SimpleDB and S3) and open-source (e.g. Cassandra and HBase) NoSQL technologies.
11:30am-12:10pm (40m) Data: NoSQL Databases
The Right Tool For The Right Job: Choosing The Best Data Storage Option
Patrick Lightbody (New Relic)
Between the NoSQL movement and new cloud offerings, it seems there are new storage options popping up every day. How do you select which one is the best for your project? The truth is that it's unlikely one option is best for all your needs. This session walks you through the various options considered by one startup and how it selected five separate storage engines - and has no regret doing so!
1:30pm-2:10pm (40m) Data: NoSQL Databases
Building Web Applications with MongoDB
Roger Bodamer (10gen)
In this workshop, one of the core MongoDB committers will present the fundamental principles of MongoDB, how to set up and interact with the database, and what to consider when building applications using a document-based data model.
2:20pm-3:00pm (40m) Data: NoSQL Databases
Redis: CS101 Data Structures via the Network
Ezra Zygmuntowicz (VMware Inc)
Redis is an entry in the new breed of nosql databases. But it takes a different approach that makes it much more interesting then most of the other key/value stores in the same category. Come learn what makes redis so useful that it seems everyone is adding it to their toolbox.
3:30pm-4:10pm (40m)
Whirr: Open Source Cloud Services
Tom White (Cloudera)
Apache Whirr is a way to run distributed systems - such as Hadoop, HBase, Cassandra, and ZooKeeper - in the cloud. Whirr provides a simple API for starting and stopping clusters for evaluation, test, or production purposes. This talk explains Whirr's architecture and shows how to use it.
4:20pm-5:00pm (40m)
Gearman: From the Worker's Perspective
Brian Aker (HP)
Many people view topics like Map/Reduce and queue systems as advanced concepts that require in-depth knowledge and time consuming software setup. Gearman is changing all that by making this barrier to entry as low as possible with an open source, distributed job queuing system.
10:40am-11:20am (40m) Data: Relational
MySQL Replication Update
Lars Thalmann (Oracle)
We describe the new replication features in MySQL 5.5 (GA) and MySQL 5.6 (Development release).
11:30am-12:10pm (40m) Data: Relational
HandlerSocket: NoSQL via MySQL
Ryan Lowe (Percona) et al
With most modern web applications, there are requirements for both SQL access to complex data as well as simple Key-Value look-ups. This session will cover how to use the HandlerSocket Plug-In for MySQL to get exponentially faster look-ups for simple access patterns.
1:30pm-2:10pm (40m) Data: Hadoop
Ephemeral Hadoop Clusters in the Cloud
Greg Fodor (Etsy)
The data & analytics teams at Etsy build up and tear down more than a thousand independent Hadoop clusters on EC2 each month. This talk discusses the benefits of this approach, where Elastic Map Reduce serves as a "meta-cluster" in which on-demand Hadoop clusters can be created, used, and shut down quickly and easily.
2:20pm-3:00pm (40m) Data: Relational
MVCC Unmasked
Bruce Momjian (EnterpriseDB)
Multiversion Concurrency Control (MVCC) allows Postgres to offer high concurrency even during significant database read/write activity. MVCC specifically offers behavior where "readers never block writers, and writers never block readers". This talk explains how MVCC is implemented in Postgres and highlights optimizations which minimize the downsides of MVCC. This talk is for advanced users.
3:30pm-4:10pm (40m) Data: Relational
MySQL for the Large Scale Social Games
Yoshinori Matsunobu (DeNA)
We at DeNA (largest social game provider in Japan) handle over 2 billion page views per day with MySQL. We heavily use SSD and tune Linux. We run non-trivial solutions such as non-stop, automated MySQL master failover. We also use MySQL not only as traditional RDBMS but also an extremely high performance NoSQL. I'd like to introduce our MySQL solutions to make our social games scale better.
4:20pm-5:00pm (40m) Data: Relational
InnoDB: Performance and Scalability Features
Inaam Rana (Oracle) et al
There are many exciting InnoDB performance and Scalability features in MySQL 5.5 and its upcoming release. But how to best use them? What are the caveats? At this session, we will describe those performance and Scalability features in depth. We will also present some benchmark results that explore the performance of those features.
10:40am-11:20am (40m)
Session
To be confirmed
11:30am-12:10pm (40m) Data: Products and Services
Hadoop - Enterprise Data Warehouse Data Flow Analysis and Optimization
Aurelian Dumitru (Dell, Inc)
In this session Dell will discuss the analysis of the data types suitable for transfer between Hadoop and EDW, EDW/Hadoop data lifecycle, Data governance between Hadoop and DBMS, and ETL performance tuning and best practices (i.e. Hadoop/DBMS connector, node and network designs, etc.)
1:30pm-2:10pm (40m) Data: Products and Services
DataStax’ Brisk – A More Powerful, Real-time, And Easier To Deploy Hadoop, Powered By Apache Cassandra
Jonathan Ellis (DataStax)
Brisk is an open-source Hadoop and Hive distro that utilizes Cassandra for its core services. Brisk provides integrated Hadoop MapReduce, Hive and job and task tracking, while providing an HDFS-compatible storage layer powered by Cassandra. By accelerating the time between data creation and analysis with DataStax’ Brisk, users experience greater reliability, simpler deployment and lower TCO.
10:40am-11:20am (40m) Data: Hadoop
Introduction to Hadoop
Tom Hanlon (Cloudera)
Hadoop gives you the ability to process massive amounts of data at scale. This presentation will show you how hadoop makes use of commodity hardware to allow you to build a system that scales, that deals gracefully with failure of individual nodes, and gives you the power of Map/Reduce to process Petabytes.
11:30am-12:10pm (40m) Data: Roulette
Architectural Anti-patterns for Data Handling
Gleicon Moraes (7co.cc)
Ever had to dig into a system that misused the most basic features of a RDBMS ? Better yet - after the whole NoSQL storm had you wondered why it didn't shown before when you had to twist your schema to fit into something it was not designed for ? Check on this anti-patterns collection and feel better that you are not alone - and how you can benefit from it even not having big data around.
1:30pm-2:10pm (40m) Data: Roulette
What Every Data Programmer Needs to Know About Disks
Ted Dziuba (eBay Local/Milo.com)
What happens when you write data to disk? We'll explore everything between your programming language and the spinning platters - both optimizations and dangerous pitfalls.
2:20pm-3:00pm (40m) Data: Real-Time and Streaming
Esperwhispering: get your real-time data game on
Theo Schlossnagle (OmniTI/Circonus)
The art of dealing with real-time data is not new. In fact, much of the world's economy is propped up my making decisions on data sub milliseconds. The technology is there, we have the power. We'll take a whirlwind tour of the open-source Esper system and understand how to integrate it into your stack to enable rapid decision making on real-time data from anywhere in your architecture.
3:30pm-4:10pm (40m) Data: Analytics and Visualization
Distributed Data Analysis with Hadoop and R
Jonathan Seidman (Orbitz Worldwide) et al
An overview of the state of the art for bringing together the analytical power of the R language with the big data capabilities of Hadoop.
4:20pm-5:00pm (40m) Data: Analytics and Visualization
QYZ: LaTeX, R and Redis for Beautiful Analytics
Noah Pepper (Lucky Sort) et al
We produce gorgeous LaTeX reports while harnessing the power of R on the backend. The data is pulled from our PostgreSQL database, the analysis and visualizations are fast and distributed thanks to Redis. We'll talk about weaving together open source tools to build powerful analytics reporting engines that rival the commercial alternatives.
10:40am-11:20am (40m) Data: Roulette
Playful Explorations of Public and Personal Data
Andrew Turner (GeoIQ)
We're being surrounded by data: Open government data, streaming media, and data we're creating as we track our lives and connect with our communities. Learn how to leverage easy to use tools to combine this together for our personal and organization decision making without requiring complex processes or training.
11:30am-12:10pm (40m) Data: Hadoop
Developing and Deploying Hadoop Security
Owen O'Malley (HortonWorks)
Adding security to an existing product is never easy, but our team at Yahoo added strong authentication to Apache Hadoop by integrating it with Kerberos. This project was delivered on time and is currently deployed on all of Yahoo's 40,000 Hadoop computers. Come learn how we added security to and why it matters.
1:30pm-2:10pm (40m) Data: Real-Time and Streaming
OpenTSDB: A Scalable, Distributed Time Series Database
Benoit Sigoure (StumbleUpon, Inc.)
OpenTSDB is an open-source, distributed time series database designed to monitor large clusters of commodity machines at an unprecedented level of granularity. OpenTSDB enables operations teams to keep track in real-time of all the metrics exposed by operating systems, applications and network equipment, and makes the data easily accessible.
2:20pm-3:00pm (40m) Data: Hadoop
YARN - Next Generation Hadoop Map-Reduce
Arun Murthy (Hortonworks Inc.)
YARN is the next generation of Hadoop Map-Reduce designed to scale out much further while allowing for running applications other than pure Map-Reduce in a highly fault-tolerant manner.
3:30pm-4:10pm (40m) Data: Real-Time and Streaming
Real-time Streaming Analysis for Hadoop and Flume
Aaron Kimball (Magnify Consulting)
This talk introduces an open-source SQL-based system for continuous or ad-hoc analysis of streaming data built on top of Flume-based data collection for Hadoop. Attendees will understand how to use a new tool to extend their Hadoop data collection pipeline with real-time streaming analytics.
4:20pm-5:00pm (40m) Data: NoSQL Databases
Querying Riak Just Got Easier - Introducing Secondary Indices
Rusty Klophaus (Basho Technologies)
The Basho engineering team has been working to make Riak more queryable with the addition of built-in indexing plus a SQL-style query language. In this talk, Rusty describes the usage, benefits, limitations, and evolution of this this functionality, called Secondary Indices. He also covers the challenges and pitfalls of adding indexing to a distributed datastore.
9:00am-9:05am (5m)
Welcome
Sarah Novotny (NGINX) et al
Opening remarks by the OSCON Data program chairs, Sarah Novotny and Bradford Stephens.
9:05am-9:20am (15m) Keynote
Finding the Perfect Match
Tom Quisel (OkCupid)
Dive into the distributed system that powers OkCupid’s match searches. Learn how we use C++, event-based programming, and SSDs to solve problems that crop up when building a high performance, high availability distributed system.
9:20am-9:40am (20m) Keynote
Benjamin Black
Benjamin Black (Boundary)
Keynote by Benjamin Black, Co-founder, fast_ip.
9:40am-10:00am (20m) Keynote
What Would You Do With Your Own Google?
Steve Yegge (Google)
It's 2021. You have a petabyte drive on your keychain, your startup company leases bulk cloud storage by the exabyte, and you have a million cores for data crunching. You even can have your own copy of the entire world's public semantic data. What do you do with it? If you're not sure yet, I've got plenty of ideas for you.
10:00am-10:10am (10m) Keynote
Q & A
An open microphone question and answer session with the morning's keynote speakers.
7:00pm-9:00pm (2h) Event
Ignite OSCON
If you had five minutes on stage what would you say? What if you only got 20 slides and they rotated automatically after 15 seconds? Would you pitch a project? Launch a web site? Teach a hack? We’re going to find out when we conduct our third Ignite event at OSCON.
10:10am-10:40am (30m)
Break: Morning Break
12:10pm-1:30pm (1h 20m)
Break: Lunch
3:00pm-3:30pm (30m)
Break: Afternoon Break
9:00pm-11:00pm (2h) Event
Monday Birds of a Feather Sessions
Birds of a Feather (BoF) sessions provide face to face exposure to those interested in the same projects and concepts. BoFs can be organized for individual projects or broader topics (best practices, open data, standards). BoFs are entirely up to you. We post your topic online and onsite and provide the space and time. You provide the engaging topic.
5:00pm-7:00pm (2h) Event
Android Happy Hour
Join other Android developers for happy hour at Gather in the Double Tree Hotel on Monday evening. Meet face-to-face and share experiences with other developers working on Android. The first 100 people there get a free drink ticket.