Data Science in R

Hadley Wickham (Rice University / RStudio)
Data
Location: E145-146
Presentation: external link
Average rating: ****.
(4.38, 21 ratings)

Attendee prerequisites for this tutorial are listed below.

R is an open-source statistical programming environment. It is widely used by academic statisticians and is become increasing popular in many applied domains. In this half-day tutorial, you’ll learn:

  • The strengths and weaknesses of a tool that many data scientists use as their secrete sauce
  • What the key tools of the data scientists toolbox look like in R. How do you data munging and manipulation, visualisation and modelling?
  • Where to go next to scale your R code to deal with massive data.

The course will begin with an brief introduction to the R language. I’ll compare R to languages that you may be more familiar with, discussing R’s functional and OO heritages. I’ll continue with a discussion of how you can use R with your existing tools, and discuss the pros and cons of a CLI vs a GUI for creating visualisations and analysing data. You’ll also learn the basic data structures and some of the tools (like subsetting) most important for fluent R use.

Next I’ll outline the most important tools for data manipulation, visualisation and modelling. We don’t have time to cover them in depth, but you’ll see the key tools and learn where to learn more. I’ll focus on the tools that allow you to fluidly move between visualisation and modelling, illustrated with a case study exploring around half a million deaths in Mexico. You’ll also learn a new strategy for dealing with large data (the inverse of the information-seeking mantra).

TUTORIAL PREREQUISITES
Attendees should have a recent install of R and Rstudio.

QUESTIONS for the speaker?: Use the “Leave a Comment or Question” section at the bottom to address them.

Photo of Hadley Wickham

Hadley Wickham

Rice University / RStudio

Hadley Wickham is an Assistant Professor and the Dobelman Family Junior Chair in Statistics at Rice University. He is an active member of the R community, has written and contributed to over 30 R packages, and won the John Chambers Award for Statistical Computing for his work developing tools for data reshaping and visualisation. His research focusses on how to make data analysis better, faster and easier, with a particular emphasis on the use of visualisation to better understand data and models.

Comments on this page are now closed.

Comments

Picture of David Mertz
David Mertz
07/16/2012 9:11am PDT

What is URL for the slides?

Picture of Hadley Wickham
Hadley Wickham
07/13/2012 1:14pm PDT

I’ve tried to design it so you’ll get something out of it regardless of whether you have a laptop or not. There will be a few times during the class where you get to practice what you learn, but due to the overall length of the class, they won’t take that long.

Mary Chang
07/13/2012 12:19pm PDT

Must we bring a laptop? Can we get a meaningful tutorial by just watching the lecture?

Picture of Hadley Wickham
Hadley Wickham
07/12/2012 10:30am PDT

I’ll provide the data you need on the day.

steve huitt
07/09/2012 7:26am PDT

Is there a data set we should get in advance as well?

Picture of Kevin Cole
Kevin Cole
06/28/2012 7:23pm PDT

Never mind. I didn’t look hard. Found it.

Picture of Kevin Cole
Kevin Cole
06/28/2012 7:20pm PDT

Where does one find RStudio for Ubuntu? (I see a gstudio package listed on the CRAN site. Is that the same beastie?)

Sponsors

For information on exhibition and sponsorship opportunities at the conference, contact Sharon Cordesse at (707) 827-7065 or scordesse@oreilly.com.

View a complete list of OSCON contacts