For information on exhibition and sponsorship opportunities at the convention, contact Sharon Cordesse at email@example.com
Download the OSCON Data Sponsor/Exhibitor Prospectus
For information on trade opportunities with O'Reilly conferences or contact mediapartners@ oreilly.com
For media-related inquiries, contact Maureen Jennings at firstname.lastname@example.org
To stay abreast of convention news and announcements, please sign up for the OSCON email bulletin (login required)
View a complete list of OSCON contacts
The last few years have brought a wealth of new data technologies organized around horizontal scalability. LinkedIn has built out an ecosystem of infrastructure to support products that use data in innovative ways and create significant infrastructure demands. This talk will cover what the essential areas of technology are, and how LinkedIn has met the needs with a mixture of great apache projects like Hadoop, Zookeeper, Pig, and Avro as well as a set of open source projects of our own creation such as Voldemort, Kafka, and Azkaban.
Hadoop is the key ingredient for offline computation, but creating an agile system for offline computing requires a lot more than just a Hadoop cluster.
Stream-processing is an under-utilized model that enables real-time data processing. Kafka is LinkedIn’s open source framework that enables map/reduce like processing without the high-latency turnaround of Hadoop jobs.
Finally live serving and data deployment are the last mile of analytical data processing—getting terrabytes of data delivered and available for serving with low latency is what actually gets your data in front of your users.
The focus of this talk will be to tell the story of how we began to understand these problems, the pitfalls along the way, and how products on our site take advantage of this ecosystem.
Jay is a Principal Engineer and Manager at LinkedIn where he was one of the first members of the Search Network and Analytics (SNA) team.
He has spent equal time working on innovative data products such as predicting professional relationships (“People You May Know”), collaborative filtering, and other data-intensive products.
Comments on this page are now closed.