Druid is an open source, real-time data store designed to work with high volume, high dimension data. Druid enables fast aggregations and arbitrary filters, supports both batch and streaming data ingestion, and seamlessly connects with popular storage systems ― including S3, HDFS, Cassandra, and more. This talk will focus on the initial motivations and design considerations behind the system. Druid is in use at Metamarkets, Netflix, and several other organizations to facilitate rapid exploration of high dimensional spaces. Metamarkets uses Druid to expose impression monetization data to ad tech companies along any arbitrary combination of demographic, content and sales-based dimensions. The Metamarkets cluster currently exposes a data set of >50 billion rows of data representing >2 trillion impressions in tables with 30+ dimensions while maintaining a 95% query latency under 1 second.
Eric Tschetter is the creator and one of the main Contributors to Druid, an open source, real-time analytical data store. He is currently an individual contributor to Tidepool.org, a non-profit diabetes research organization. Eric was previously the VP of Engineering and lead architect at Metamarkets, and has held senior engineering positions at Ning and LinkedIn.He holds bachelors degrees in Computer Science and Japanese from the University of Texas at Austin, and a M.S. from the University of Tokyo in Computer Science.
Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?
Join the conversation here (requires login)
For exhibition and sponsorship opportunities, contact Sharon Cordesse at email@example.com
For information on trade opportunities with O'Reilly conferences contact firstname.lastname@example.org
For media-related inquiries, contact Maureen Jennings at email@example.com
View a complete list of OSCON contacts