Sponsors

  • 10gen
  • DataStax, Inc.
  • Dell
  • Google
  • Lexis Nexis
  • Oracle
  • VMware
  • Percona

Sponsorship Opportunities

For information on exhibition and sponsorship opportunities at the convention, contact Sharon Cordesse at scordesse@oreilly.com

Download the OSCON Data Sponsor/Exhibitor Prospectus

Media Partner Opportunities

For information on trade opportunities with O'Reilly conferences or contact mediapartners@ oreilly.com

Press and Media

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

OSCON Bulletin

To stay abreast of convention news and announcements, please sign up for the OSCON email bulletin (login required)

Contact Us

View a complete list of OSCON contacts

The Big Data Ecosystem at LinkedIn

Jay Kreps (LinkedIn)
Data: Big Data
Location: B118-119
Average rating: ****.
(4.11, 9 ratings)

The last few years have brought a wealth of new data technologies organized around horizontal scalability. LinkedIn has built out an ecosystem of infrastructure to support products that use data in innovative ways and create significant infrastructure demands. This talk will cover what the essential areas of technology are, and how LinkedIn has met the needs with a mixture of great apache projects like Hadoop, Zookeeper, Pig, and Avro as well as a set of open source projects of our own creation such as Voldemort, Kafka, and Azkaban.

Hadoop is the key ingredient for offline computation, but creating an agile system for offline computing requires a lot more than just a Hadoop cluster.

Stream-processing is an under-utilized model that enables real-time data processing. Kafka is LinkedIn’s open source framework that enables map/reduce like processing without the high-latency turnaround of Hadoop jobs.

Finally live serving and data deployment are the last mile of analytical data processing—getting terrabytes of data delivered and available for serving with low latency is what actually gets your data in front of your users.

The focus of this talk will be to tell the story of how we began to understand these problems, the pitfalls along the way, and how products on our site take advantage of this ecosystem.

Presentation

Photo of Jay Kreps

Jay Kreps

LinkedIn

Jay is a Principal Engineer and Manager at LinkedIn where he was one of the first members of the Search Network and Analytics (SNA) team.

He was among the original authors of a number of open source projects in the scalable data systems space, including Voldemort, Azkaban, and Kafka.

He has spent equal time working on innovative data products such as predicting professional relationships (“People You May Know”), collaborative filtering, and other data-intensive products.

Comments on this page are now closed.

Comments

Picture of Sheeri K. Cabral
Sheeri K. Cabral
09/05/2011 5:00am PDT

A video for this talk can be found online at www.youtube.com/watch?v=gvg...