Sponsors

  • 10gen
  • DataStax, Inc.
  • Dell
  • Google
  • Lexis Nexis
  • Oracle
  • VMware
  • Percona

Sponsorship Opportunities

For information on exhibition and sponsorship opportunities at the convention, contact Sharon Cordesse at scordesse@oreilly.com

Download the OSCON Data Sponsor/Exhibitor Prospectus

Media Partner Opportunities

For information on trade opportunities with O'Reilly conferences or contact mediapartners@ oreilly.com

Press and Media

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

OSCON Bulletin

To stay abreast of convention news and announcements, please sign up for the OSCON email bulletin (login required)

Contact Us

View a complete list of OSCON contacts

Lumberyard: Time Series Indexing at Scale

Josh Patterson (Cloudera)
Average rating: ***..
(3.75, 8 ratings)

Time series data is becoming more prevalent across a wider swath of industries due to the ongoing explosion of available data 1. Time Series sensors are being ubiquitously integrated in places like cell phones, environmental sensors, and the smart grid 4. It’s also been shown that shapes in images can be decomposed into time series data which allows the shapes to achieve rotation and scale invariance allowing for easier comparison. We’re seeing the cost to sequence the human genome continue to decrease rapidly, shifting pressure to the storage and processing technologies for these genomes which can also be processed with time series techniques.

Although indexing techniques in multi-dimensional index structures combined with today’s RDBMS can handle time series data, as we scale out this type of data these systems strain to scale with the high insertion rates and real time query requirements. In response to this strain we’re seeing many companies employ HBase to handle the throughput and scale of rising data loads. Groups are also looking at techniques such as Keogh’s SAX technique 2 in order to search for patterns time series data (ex: openPDC and Hadoop). A later evolution of the SAX technique called iSAX involves indexing time series data for low latency queries. In this talk we introduce “Lumberyard” which is a scalable indexing and low latency fuzzy pattern searching time series data. Lumberyard is available currently 3 as an ASF 2.0 Licensed project on github and uses HBase and iSAX to achieve both scale and index/search respectively.

In this talk we’ll take a look at some of the indexing at scale issues that Lumberyard solves. We’ll look at some of the design issues involved in moving the iSAX index from a single process in memory data structure to a HBase-persisted data structure. Given that Lumberyard is experimental, we’ll also look at the current performance numbers and where the code stands today. This talk should be approachable for the novice to get ideas about the variety of places that hold time series data around them and for the advanced algorithm enthusiast who enjoys a design talk.

1 http://www.cloudera.com/blog/2011/03/simple-moving-average-secondary-sort-and-mapreduce-part-1/
2 http://code.google.com/p/jmotif/
3 https://github.com/jpatanooga/Lumberyard
4 http://openpdc.codeplex.com/

Photo of Josh Patterson

Josh Patterson

Cloudera

Master’s Thesis: self-organizing mesh networks
Published in IAAI-09: TinyTermite: A Secure Routing Algorithm

Conceived, built, and led Hadoop integration for the openPDC project at TVA (Smartgrid stuff). Led small team which designed classification techniques for timeseries and Map Reduce. Open source work at http://openpdc.codeplex.com

Now: Solutions Architect at Cloudera

Comments on this page are now closed.

Comments

Picture of Clive Boulton
Clive Boulton
07/26/2011 9:01pm PDT

My oh my Josh Patterson groks big data like Don Knuth groks algorithms