The maturation and development of open source technologies has made it easier than ever for companies to derive insights from vast quantities of data. In this session, we will cover how to build a real-time analytics stack using Kafka, Storm, and Druid.
Analytics pipelines running purely on Hadoop can suffer from hours of data lag. Initial attempts to solve this problem often lead to inflexible solutions, where the queries must be known ahead of time, or fragile solutions where the integrity of the data cannot be assured. Combining Hadoop with Kafka, Storm, and Druid can guarantee system availability, maintain data integrity, and support fast and flexible queries.
In the described system, Kafka provides a fast message bus and is the delivery point for machine-generated event streams. Storm and Hadoop work together to load data into Druid. Storm handles near-real-time data and Hadoop handles historical data and data corrections. Druid provides flexible, highly available, low-latency queries.
This talk is based on our real-world experiences building out such a stack for online advertising analytics at Metamarkets.
Fangjin is one of the main Druid contributors and one of the first developers to Metamarkets. He mainly works on core infrastructure and platform development. Fangjin comes to Metamarkets from Cisco where he developed diagnostic algorithms for various routers and switches. He holds a BASc in Electrical Engineering and a MASc in Computer Engineering from the University of Waterloo, Canada.
Gian is a contributor to the Kafka, Storm, and Druid open source projects and a developer at Metamarkets. He previously worked at Yahoo!, where he was responsible for its worldwide server deployment and configuration management platform. He holds a BS in Computer Science from California Institute of Technology
Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?
Join the conversation here (requires login)
For exhibition and sponsorship opportunities, contact Sharon Cordesse at email@example.com
For information on trade opportunities with O'Reilly conferences contact firstname.lastname@example.org
For media-related inquiries, contact Maureen Jennings at email@example.com
View a complete list of OSCON contacts