Skip to main content

Scalable Analytics with R, Hadoop and RHadoop

Gwen Shapira (Cloudera)
Databases & Datastores
Portland 256

Modern data applications often require analyzing multi-terabyte data sets. R is one of the most popular languages for data processing. It is best known for its large library of advanced statistical tools. However, using R to analyze multi-terabyte data sets present challenges – How do we avoid transmitting all the data over the network? How do we scale statistical algorithms? What are the options of integrating R with Hadoop clusters?

This presentation is geared towards R beginners with some knowledge of Hadoop and Map-Reduce concepts. Attendees will learn important R concepts, effective data wrangling tools and how to scale R algorithms for large data sets using RHadoop. We will discuss RHadoop in depth and share deployment, scalability and troubleshooting lessons that we have learned the hard way.

Photo of Gwen Shapira

Gwen Shapira

Cloudera

Gwen Shapira is a Solutions Architect at Cloudera and leader of IOUG Big Data SIG. Gwen Shapira studied computer science, statistics and operations research at the University of Tel Aviv, and then went on to spend the next 15 years in different technical positions in the IT industry. She specializes in scalable and resilient solutions and helps her customers build high-performance large-scale data architectures using Hadoop. Gwen Shapira is a frequent presenter at conferences and regularly publishes articles in technical magazines and her blog.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)