Skip to main content

Getting Started with Scalding, Twitter's High-level Scala API for Hadoop MapReduce

Avi Bryant (Stripe)
Databases & Datastores | Java & JVM
E145/146
Tutorial Please note: to attend, your registration must include Tutorials.
Average rating: ****.
(4.33, 3 ratings)

THIS TUTORIAL HAS REQUIREMENTS AND INSTRUCTIONS LISTED BELOW

Start on low heat with a base of Hadoop; map, then reduce. Flavor, to taste, with Scala’s concise, functional syntax and collections library. Simmer with some Pig bones: a tuple model and high-level join and aggregation operators. Mix in Cascading to hold everything together and boil until it’s very, very hot, and you get Scalding, an API for MapReduce out of Twitter.

Scalding is an open source Scala framework for concisely describing Hadoop MapReduce jobs. I started the project at Twitter as a way for ad server engineers to run simple queries on the ad logs, without needing to learn a specialized language like Pig, or dive too deeply into the guts of Hadoop. Since then, it’s been adopted by teams at Etsy, LinkedIn, EBay, SoundCloud, LivePerson, Stripe, and others, and been extended with convenient APIs for everything from large-scale sparse matrix multiplication to locality-sensitive hashing.

This tutorial will walk you through getting started with Scalding, from writing the simplest word-count job up to using probabilistic data structures for distributed machine learning. No specific background in Scala, Hadoop, distributed computing or machine learning is required, though an interest in any or all of these might help.

Bring a laptop, or share one with a friend.

TUTORIAL REQUIREMENTS AND INSTRUCTIONS FOR ATTENDEES

* No specific knowledge needed. Some familiarity with either Scala or Hadoop would be helpful but is not at all required.
* A laptop with a working JDK installation.

QUESTIONS for the speaker?: Use the “Leave a Comment or Question” section at the bottom to address them.

Photo of Avi Bryant

Avi Bryant

Stripe

Avi has led product, engineering, and data science teams at Etsy, Twitter and Dabble DB (which he co-founded and Twitter acquired). He’s known for his open source work on projects such as Seaside, Scalding, and Algebird. Avi currently works at Stripe.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)