Modern data applications often require analyzing multi-terabyte data sets. R is one of the most popular languages for data processing. It is best known for its large library of advanced statistical tools. However, using R to analyze multi-terabyte data sets present challenges – How do we avoid transmitting all the data over the network? How do we scale statistical algorithms? What are the options of integrating R with Hadoop clusters?
This presentation is geared towards R beginners with some knowledge of Hadoop and Map-Reduce concepts. Attendees will learn important R concepts, effective data wrangling tools and how to scale R algorithms for large data sets using RHadoop. We will discuss RHadoop in depth and share deployment, scalability and troubleshooting lessons that we have learned the hard way.
Gwen Shapira is a Solutions Architect at Cloudera and leader of IOUG Big Data SIG. Gwen Shapira studied computer science, statistics and operations research at the University of Tel Aviv, and then went on to spend the next 15 years in different technical positions in the IT industry. She specializes in scalable and resilient solutions and helps her customers build high-performance large-scale data architectures using Hadoop. Gwen Shapira is a frequent presenter at conferences and regularly publishes articles in technical magazines and her blog.
For exhibition and sponsorship opportunities, contact Sharon Cordesse at email@example.com
For information on trade opportunities with O'Reilly conferences contact firstname.lastname@example.org
For media-related inquiries, contact Maureen Jennings at email@example.com
View a complete list of OSCON contacts