Sponsors

  • 10gen
  • DataStax, Inc.
  • Dell
  • Google
  • Lexis Nexis
  • Oracle
  • VMware
  • Percona

Sponsorship Opportunities

For information on exhibition and sponsorship opportunities at the convention, contact Sharon Cordesse at scordesse@oreilly.com

Download the OSCON Data Sponsor/Exhibitor Prospectus

Media Partner Opportunities

For information on trade opportunities with O'Reilly conferences or contact mediapartners@ oreilly.com

Press and Media

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

OSCON Bulletin

To stay abreast of convention news and announcements, please sign up for the OSCON email bulletin (login required)

Contact Us

View a complete list of OSCON contacts

OSCON: Data 2011 Schedule

Below are the confirmed and scheduled talks (schedule subject to change).

Customize Your Own Schedule

Create your own OSCON: Data schedule using the personal scheduler function. Mark the keynotes, workshops, sessions, and events you want to attend by clicking on the calendar icon [calendar icon] next to each listing. Then click on "personal schedule" below and get your own customized schedule generated.

B118-119
Add The Big Data Ecosystem at LinkedIn to your personal schedule
10:40am The Big Data Ecosystem at LinkedIn Jay Kreps (LinkedIn)
Add Government Legislative Data, The Other Great White Fail Whale & How To Avoid It to your personal schedule
11:30am Government Legislative Data, The Other Great White Fail Whale & How To Avoid It Jared Williams (New York State Senate), Noel Hidalgo (World Economic Forum), Graylin Kim (New York State Senate)
Add Schema Design with MongoDB to your personal schedule
1:30pm Schema Design with MongoDB Dwight Merriman (10gen)
Add Lean Big Data for Mobile to your personal schedule
2:20pm Lean Big Data for Mobile Erik Onnen (Urban Airship)
Add PNUTS to your personal schedule
3:30pm PNUTS Adam Silberstein (Yahoo!)
Add Scaling Solr Horizontally in the Cloud to your personal schedule
4:20pm Scaling Solr Horizontally in the Cloud Andy Blyler (Barracuda Networks), Lindsay Snider
C121/122
Add Taming the Big Data Fire Hose to your personal schedule
10:40am Taming the Big Data Fire Hose John Hugg (VoltDB)
Add Why Know Algorithms to your personal schedule
11:30am Why Know Algorithms Andrew Aksyonoff (Sphinx Technologies), Richard Kelm (Sphinx Search)
Add Castle: Reinventing Storage for Big Data to your personal schedule
1:30pm Castle: Reinventing Storage for Big Data Tom Wilkie (Acunu Ltd)
Add Moving Day: Migrating Your Big Data from A to B to your personal schedule
3:30pm Moving Day: Migrating Your Big Data from A to B Laura Thomson (Mozilla Corporation), Josh Berkus (PostgreSQL Experts), Corey Shields (Mozilla Corporation), Justin Dow (Mozilla Corporation)
C125/126
Add LexisNexis HPCC Systems Finds Health Care Fraud to your personal schedule
11:30am LexisNexis HPCC Systems Finds Health Care Fraud Bill Fox J.D., M.A. (LexisNexis), Charles Kaminski (LexisNexis)
Add Practical Data Storage: MongoDB @ foursquare to your personal schedule
4:20pm Practical Data Storage: MongoDB @ foursquare Harry Heymann (foursquare)
C123
Add Harder, Better, Faster, Stronger: PostgreSQL 9.1 to your personal schedule
10:40am Harder, Better, Faster, Stronger: PostgreSQL 9.1 Selena Deckelmann (PostgreSQL)
Add Facebook Messages and HBase to your personal schedule
11:30am Facebook Messages and HBase Nicolas Spiegelberg (Facebook)
Add Optimizing MySQL to Let People Argue to your personal schedule
1:30pm Optimizing MySQL to Let People Argue Jeremy Bingham (Dailykos.com)
Add Lumberyard: Time Series Indexing at Scale to your personal schedule
4:20pm Lumberyard: Time Series Indexing at Scale Josh Patterson (Cloudera)
C124
Add HBase and Hive at StumbleUpon to your personal schedule
10:40am HBase and Hive at StumbleUpon Jean-Daniel Cryans (Cloudera)
Add Forests Can Fight Back with Open Source Technology to your personal schedule
11:30am Forests Can Fight Back with Open Source Technology Jeff Hamann (Forest Informatics)
Add Design and Implementation of a Real-Time Cloud Analytics Platform to your personal schedule
2:20pm Design and Implementation of a Real-Time Cloud Analytics Platform David Pacheco (Joyent), Brendan Gregg (Joyent)
Add Discover and Share Spatial Resources on the Web to your personal schedule
3:30pm Discover and Share Spatial Resources on the Web Christine White (Esri)
Add Neo4j Spatial - Geo Data for the Rest of Us to your personal schedule
4:20pm Neo4j Spatial - Geo Data for the Rest of Us Peter Neubauer (Neo Technology)
Add Welcome to your personal schedule
9:00am Plenary
Room: Oregon Ballroom 203/204
Welcome Sarah Novotny (NGINX), Bradford Stephens (Drawn to Scale)
Add Databases for Agile Development to your personal schedule
9:05am Plenary
Room: Oregon Ballroom 203/204
Databases for Agile Development Dwight Merriman (10gen)
Add Adrian Cockcroft to your personal schedule
9:20am Plenary
Room: Oregon Ballroom 203/204
Adrian Cockcroft Adrian Cockcroft (Battery)
Add Living In A Relational World to your personal schedule
9:40am Plenary
Room: Oregon Ballroom 203/204
Living In A Relational World Brian Aker (HP)
Add OSCON Data Innovation Award to your personal schedule
10:00am Plenary
Room: Oregon Ballroom 203/204
OSCON Data Innovation Award
10:10am Morning Break
Room: Exhibit Hall C
12:10pm Lunch - Sponsored by Alfresco
Room: Exhibit Hall C
3:00pm Afternoon Break
Room: Exhibit Hall C
Add Puppet Labs Party to your personal schedule
8:00pm Plenary
Room: 411 NW Park Ave.
Puppet Labs Party
Add Opening Reception (sponsored by 10Gen) to your personal schedule
5:00pm Event
Room: Expo Hall
Opening Reception (sponsored by 10Gen)
Add OSCON Carnival to your personal schedule
6:00pm Event
Room: Hall B
OSCON Carnival
10:40am-11:20am (40m) Data: Big Data
The Big Data Ecosystem at LinkedIn
Jay Kreps (LinkedIn)
The last few years have brought a wealth of new data technologies organized around horizontal scalability. This talk will cover the essential infrastructure areas: real-time stream processing, offline data crunching, large-scale data deployments and live serving. The focus will be on how these ingredients come together to enable innovative data-driven products at LinkedIn.
11:30am-12:10pm (40m) Data: Big Data
Government Legislative Data, The Other Great White Fail Whale & How To Avoid It
Jared Williams (New York State Senate) et al
The story of the development team and what lessons we learned in building Open Legislation - an open government platform. It will detail our transition from a MySQL back end to an application fully powered by Lucene, the data quality and efficiency issues that we’ve had to address, and how we’re now trying to rebuild internal trust after our iterative and initially shaky development process.
1:30pm-2:10pm (40m) Data: NoSQL Databases
Schema Design with MongoDB
Dwight Merriman (10gen)
One of the challenges that comes with moving to MongoDB is figuring how to best model your data. While most developers have internalized the rules of thumb for designing schemas for RDBMSs, these rules don't always apply to MongoDB.
2:20pm-3:00pm (40m) Data: Big Data
Lean Big Data for Mobile
Erik Onnen (Urban Airship)
This talk will cover lessons learned in building Urban Airship's large-scale data warehouse in EC2 including PostgreSQL, Kafka, Cassandra, HBase and Hadoop.
3:30pm-4:10pm (40m) Data: NoSQL Databases
PNUTS
Adam Silberstein (Yahoo!)
I will overview PNUTS, a large-scale, geographically-replicated serving data store in widespread use at Yahoo! I will introduce key use cases, the main system components, key design decisions, and ongoing work.
4:20pm-5:00pm (40m) Data: Scaling
Scaling Solr Horizontally in the Cloud
Andy Blyler (Barracuda Networks) et al
Solr, an open source enterprise search server, scales very well within an index (vertical scaling). It is when you have multiple indexes (horizontal scaling) that it starts to get hairy, which happens a lot when you are hosting a cloud based solution for multiple users. In this session we will discuss these issue as well as the techniques of how to overcome them in-depth.
10:40am-11:20am (40m) Data: Real-Time and Streaming
Taming the Big Data Fire Hose
John Hugg (VoltDB)
In this talk, we will introduce a simple formula for all Big Data applications: Big Data = Fast Data + Deep Data. Through a use-case format, we will discuss the specialized requirements for real-time (“fast”) and analytic (“deep”) data management.
11:30am-12:10pm (40m) Data: Relational
Why Know Algorithms
Andrew Aksyonoff (Sphinx Technologies) et al
Whether you're a beginner Web guy or a veteran DBA, whether you get hands dirty with any code or just manage systems, you still must know algorithms. How come? Because that knowledge enables you to optimize your work, conduct correct benchmarks, and make educated decisions. We'll show you how knowing only a little about SQL internals can help so much with tuning things.
1:30pm-2:10pm (40m) Data: Big Data
Castle: Reinventing Storage for Big Data
Tom Wilkie (Acunu Ltd)
The standard Linux storage stack wasn't designed for write-heavy big data workloads, nor is it well-suited to modern hardware: large, slow SATA disks, SSDs or many cores. Castle, an open-source project, is a ground-up overhauling of RAID, file systems, and the POSIX interface.
2:20pm-3:00pm (40m) Data: Relational
Drizzle, Virtualizing and Scaling MySQL for the Future
Brian Aker (HP)
Ever wondered what would happen if you could rethink a decade worth of design changes? Drizzle is a redesign of the MySQL server targeted at web development and cloud infrastructure. Update yourself on the latest features, and use cases for Drizzle7 and what is in store for the near future.
3:30pm-4:10pm (40m) Data: Scaling
Moving Day: Migrating Your Big Data from A to B
Laura Thomson (Mozilla Corporation) et al
If you've ever had to move from data center to data center or to the cloud, or from old hardware to new hardware, you know that it's even more painful than moving house. In this presentation, survivors will tell you how to stay sane (and how to get it right) with a case study from Mozilla: moving 30TB of crash reports with no downtime in data collection.
4:20pm-5:00pm (40m)
Database Scalability Patterns: Sharding for Massive Growth
Robert Treat (OmniTI)
Everyone thinks they know what sharding is and how to do it, but simple horizontal read scaling is the small potatoes. In this talk we'll focus on the sharding pattern for large scale read/write architectures, based on real world implementations. Supporting millions of users on commodity hardware doesn't need magical software, just careful application of the right scalability pattern.
11:30am-12:10pm (40m) Products & Services
LexisNexis HPCC Systems Finds Health Care Fraud
Bill Fox J.D., M.A. (LexisNexis) et al
A big data case study with the NY Medicaid Inspector General's Office and HPCC Systems from LexisNexis.
4:20pm-5:00pm (40m) Data: Products and Services
Practical Data Storage: MongoDB @ foursquare
Harry Heymann (foursquare)
A talk about how to scale foursquare using MongoDB and Scala.
10:40am-11:20am (40m) Data: Relational
Harder, Better, Faster, Stronger: PostgreSQL 9.1
Selena Deckelmann (PostgreSQL)
PostgreSQL continues to provide a major release every year full of improvements, better performance and features that measure up to the most popular commercial databases. Our 2011 release, 9.1, is no exception!
11:30am-12:10pm (40m) Data: Hadoop
Facebook Messages and HBase
Nicolas Spiegelberg (Facebook)
In November, Facebook launched a new version of Messages that combines chat, SMS, email, and Messages into a real-time conversation. Facebook relies on Apache HBase, a NoSQL-style database, for storing this real-time message data. This talk will elaborate on our decision process, system configuration, scaling issues, and advantages gained by choosing Open Source.
1:30pm-2:10pm (40m) Data: Relational
Optimizing MySQL to Let People Argue
Jeremy Bingham (Dailykos.com)
Keeping a busy site going when you don't have a lot of servers or developer resources can be a struggle. Hear what we did at Daily Kos to make the most of what we had to bring MySQL in line, make it quick, and keep the users and the boss happy.
2:20pm-3:00pm (40m) Data: Big Data
Big Data For Less – Dealing with Large Data Sets on a Startup’s Budget
Kate Matsudaira (SEOmoz)
Building large data applications can present a unique set of technical challenges because things that often work well in the conventional development environment can become incredibly arduous or expensive when applied on a much bigger scale. This talk will cover some of those challenges and potential solutions for each.
3:30pm-4:10pm (40m)
Designing and Implementing Asynchronous Distributed Systems: Challenges, Strategies, and a Million Things That Go Wrong
Scott Andreas (Boundary Inc.)
This language-agnostic proposal focuses upon concepts and strategies critical to the design and implementation of asynchronous systems and data processing layers. Key components include a survey of implementation strategies for non-blocking edge tiers, patterns for building out a distributed worker / processing tier, along with several horror stories of cascading failures and their resolution.
4:20pm-5:00pm (40m) Data: Analytics and Visualization, Data: Hadoop, Data: NoSQL Databases
Lumberyard: Time Series Indexing at Scale
Josh Patterson (Cloudera)
Time Series sensors are being ubiquitously integrated in places like cell phones, environmental sensors, and the smart grid. As we scale out this type of data RDBMS systems strain to scale with the high insertion rates and real time query requirements. In this talk we introduce “Lumberyard” which is a scalable indexing and low latency fuzzy pattern searching time series data.
10:40am-11:20am (40m) Data: Analytics and Visualization
HBase and Hive at StumbleUpon
Jean-Daniel Cryans (Cloudera)
Imagine for a moment doing a JOIN on two HBase tables, crazy talk right? Well now you can thanks to Hive. True, it is only meant to be used in a batch context, but we have being doing it for a few months now at StumbleUpon and our analysts and engineers love it. This presentation will cover how the Hive-HBase integration works and how we use it at our company.
11:30am-12:10pm (40m) Data: Analytics and Visualization
Forests Can Fight Back with Open Source Technology
Jeff Hamann (Forest Informatics)
Learn how to cobble together a PostgreSQL database, install a few handy R packages, a pinch of language extensions, and a handful of publicly available data to generate a forest monitoring platform to help landscape managers make better decisions using basic design-engineering paradigms to perform quick trade-off analyses.
1:30pm-2:10pm (40m) Data: Analytics and Visualization
Synthetic Biology and Data-Driven Synthetic Biology for Personalized Medicine and Clean Energy
Russell Hanson (RSI/Harvard/TCIN)
Synthetic biology is a new field where basic biological components can be engineered to create something new. It often involves DNA synthesizers, ligation, promoters, and polymerase chain reaction -- which may or may not be safe for your in silico environment. However, as the size and complexity of the systems increase, tools become more and more important, thus CAD for biology has emerged.
2:20pm-3:00pm (40m) Data: Real-Time and Streaming
Design and Implementation of a Real-Time Cloud Analytics Platform
David Pacheco (Joyent) et al
We'll present the architecture and implementation of a Node.js/DTrace-based distributed platform for analyzing the performance of cloud applications in real-time. We'll do a live demo on a real, internet-facing cloud and discuss some of the interesting performance pathologies we've found and explained using this tool.
3:30pm-4:10pm (40m) Data: Roulette
Discover and Share Spatial Resources on the Web
Christine White (Esri)
Sharing data is critical in a world where crisis can occur at any moment. Often, valuable data is stored in disparate locations with no information on how to access. This presentation discusses spatial data discovery and open source tools for implementing a data-sharing catalog. Esri’s Geoportal Server will be used to show sharing and discovery in action. Talk is open to all attendees.
4:20pm-5:00pm (40m) Data: Roulette
Neo4j Spatial - Geo Data for the Rest of Us
Peter Neubauer (Neo Technology)
Location-based services are hot, but geographic datasets are complex. But this shouldn’t put you off writing awesome location-aware services. This talk will show how to create spatial models and query the Open Street Map dataset together with social data using the Neo4j graph database.
9:00am-9:05am (5m)
Welcome
Sarah Novotny (NGINX) et al
Opening remarks by the OSCON Data program chairs, Sarah Novotny and Bradford Stephens.
9:05am-9:20am (15m) Keynote
Databases for Agile Development
Dwight Merriman (10gen)
Much has been made of scalability as a driver for choosing a database, but the choice of a database influences much more than the scaling architecture. Different database choices drive different data models which in turn influence the development process.
9:20am-9:40am (20m) Keynote
Adrian Cockcroft
Adrian Cockcroft (Battery)
Keynote by Adrian Cockcroft, Cloud Architect, Netflix.
9:40am-10:00am (20m) Keynote
Living In A Relational World
Brian Aker (HP)
We love data, and today we generate data in astronomical amounts. When we hit save on a document, snap a photo, or fill out a form online, we want to know that this data will persist, and we want to know that we can share, access, or reference it in the future. For any meaningful use, we need to how data relates to other data.
10:00am-10:10am (10m) Keynote
OSCON Data Innovation Award
The first OSCON Data Innovation Award winner will be announced.
10:10am-10:40am (30m)
Break: Morning Break
12:10pm-1:30pm (1h 20m)
Break: Lunch - Sponsored by Alfresco
3:00pm-3:30pm (30m)
Break: Afternoon Break
8:00pm-10:00pm (2h) Event
Puppet Labs Party
Join Puppet Labs and SwellPath Interactive at their headquarters in the Pearl District. The party is free, as in free beer, food and fun. Two floors, two open bars, and more. Take the Green or Yellow line (free transit) west to Union Station and walk 2 blocks west to 411 NW Park Ave.
5:00pm-6:00pm (1h) Event
Opening Reception (sponsored by 10Gen)
Grab a drink and kick off the 13th edition of OSCON by meeting and mingling with exhibitors and fellow attendees.
6:00pm-8:00pm (2h) Event
OSCON Carnival
Step right up and join us at the O'Reilly OSCON Carnival. There will be games, clowns, sumo wrestling, log rolling, tattoos, and lots more. There's free food, free wine, and free beer. You’ve never seen a carnival like this. Trust us.