Introduction to Apache Drill

Ted Dunning (MapR), Jacques Nadeau (Apache Foundation/MapR)
Data
Location: E143/144
Average rating: **...
(2.91, 11 ratings)

THIS TUTORIAL HAS REQUIREMENTS AND INSTRUCTIONS LISTED BELOW

Apache Drill is a new type of massively parallel processing (MPP) framework that allows companies to federate NoSQL and old-line data storage technologies in a single query interface. This query interface operates using DrQL, a superset of SQL2003 enhanced to support manipulation of complex hierarchical and schema-less data. In addition to being the first open source framework to tackle this problem, Apache Drill also provides a very powerful abstraction layer that allows users to extend the processing framework to solve their business problems.

We’ll start the session by giving users an overview of the Apache Drill and its key extension APIs. Afterwards, we’ll describe an example use case where Apache Drill’s native capabilities are lacking. We’ll then work through design and development using Java and scripting to add extensions to the Apache Drill platform.

The coding exercises will generate: new data processing logical and physical operators, a new type of storage engine, additional query optimizer rules and implementation of a new domain specific language focused on our particular use case. Upon completion, attendees will have a strong understanding of Apache Drill fundamentals, a set of real-world useful extensions to the platform and a new tool in their data analysis tool chest.

TUTORIAL REQUIREMENTS AND INSTRUCTIONS FOR ATTENDEES
Coders might like to take a look at Apache Drill (http://incubator.apache.org/drill) and they should come with a laptop that has Java 1.7, maven and git installed.

QUESTIONS for the speaker?: Use the “Leave a Comment or Question” section at the bottom to address them.

Photo of Ted Dunning

Ted Dunning

MapR

Serial startup and artist and open-source innovator, particularly interested in large data systems and statistical modeling.

Jacques Nadeau

Apache Foundation/MapR

Nadeau is MapR’s lead developer on the Apache Drill open source project. Prior to joining MapR, he was CTO with in.vu and YapMap, where he built and launched massively parallel distributed search engine on top of Hadoop, supporting more than 650 million documents with sub-second response times. Evolved platform through three major architectures, ultimately building our own custom indexing kernel.

Sponsors

Sponsorship Opportunities

For information on exhibition and sponsorship opportunities at the conference, contact Sharon Cordesse at (707) 827-7065 or scordesse@oreilly.com.

Contact Us

View a complete list of OSCON contacts