Scaling near-realtime analytics with Kafka and HBase

Dave Revell (Urban Airship), Nate Putnam (Urban Airship )
Data
Location: Portland 252
Average rating: ***..
(3.29, 7 ratings)

Turning billions of events into near-realtime analytics is hard, and Mobile Big Data is getting really big. Learn concrete techniques that Urban Airship uses to collect events from hundreds of millions of mobile apps and turn them into meaningful analytics using open source technology like Kafka and HBase.

Over the last year, Urban Airship has scaled from processing thousands to billions of events per month using a variety of techniques. Increasing scale continually forces our architecture to evolve as we approach scaling limits. Changes include:
- Introducing smaller, fine-grained incremental jobs to decrease latency and save work compared to more common MapReduce-based approaches, without the complication of being fully realtime
- Migrating from a hybrid cloud strategy to physical hardware to improve IO performance and reduce cost
- Choosing the right work queueing approach that provides reliability and performance while sidestepping tricky distributed systems problems
- Refining our Java producer-consumer architecture to capitalize on the strengths of Kafka and HBase, particularly regarding our HBase schema

Photo of Dave Revell

Dave Revell

Urban Airship

Dave is an engineer at Urban Airship, where he helps design and build the large scale analytics backend. He works frequently with Java, HBase, Kafka, and Linux and is interested in large fast databases.

Photo of Nate Putnam

Nate Putnam

Urban Airship

Nate is a Tech Lead at Urban Airship where he helps work on distrusted systems. Previously Nate built social activity streams at Jive Software.

Sponsors

For information on exhibition and sponsorship opportunities at the conference, contact Sharon Cordesse at (707) 827-7065 or scordesse@oreilly.com.

View a complete list of OSCON contacts