Sponsors

  • 10gen
  • DataStax, Inc.
  • Dell
  • Google
  • Lexis Nexis
  • Oracle
  • VMware
  • Percona

Sponsorship Opportunities

For information on exhibition and sponsorship opportunities at the convention, contact Sharon Cordesse at scordesse@oreilly.com

Download the OSCON Data Sponsor/Exhibitor Prospectus

Media Partner Opportunities

For information on trade opportunities with O'Reilly conferences or contact mediapartners@ oreilly.com

Press and Media

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

OSCON Bulletin

To stay abreast of convention news and announcements, please sign up for the OSCON email bulletin (login required)

Contact Us

View a complete list of OSCON contacts

Hands On Mahout - Mammoth Scale Machine Learning

Robin Anil (Google), Ted Dunning (MapR Technologies)
Data: Analytics and Visualization
Location: Oregon Ballroom 203
Average rating: **...
(2.75, 4 ratings)

Attendee prerequisites for this tutorial are listed below.

Mahout is an open source machine learning library from Apache. At the present stage of development, it is evolving with a focus on collaborative filtering/recommendation engines, clustering, and classification.

There is no user interface, or a pre-packaged distributable server or installer. It is, at best, a framework of tools intend to be used and adapted by developers. The algorithms in this “suite” can be used in applications ranging from recommendation engines for movie websites to designing early warning systems in credit risk engines supporting the cards industry out there.

This tutorial aims at helping you set up Mahout to run on a Hadoop setup. The instructor will walk you through the basic idea behind each of the algorithms. Having done that, we’ll take a look at how it can be run on some of the large-sized datasets and how it can be used to solve real world problems.

If your site or smartphone app or viral facebook app collects data which you really want to use a lot more productively, this session is for you!

PREREQUISITES

Instructions for setting up Mahout

First, subscribe to mahout-oscon googlegroup for updates, announcements and for discussing issues with setting up mahout for the tutorial.

Platforms supported by Mahout
  • Linux
  • Mac
    (its possible to setup Mahout on Cygwin on Windows, but its an unsupported platform for both Hadoop and Mahout)
System Requirements
  • Java 1.6.x or greater.
  • Maven 2.2.x to build the source code.
  • Subversion 1.6 or higher
On Mac
  • Install mac-ports http://www.macports.org/
  • Install maven, subversion using macports . The commands are given below
    • sudo port install subversion
    • sudo port install maven
  • Install Java for MacOSX from the apple website or using the MacUpdate mechanism
On Linux
  • On debian/ubuntu systems, install subversion, jdk and maven using the aptitude repo tool (apt-get install <>)
  • On fedora systems, install subversion, jdk and maven from yum repo tool (yum install <>)
  • Ensure the versions numbers are as given above
Setting up instructions
If everything went fine, you will have a compiled library of mahout on your laptop. To test if everything has succeeded, run the following command to test your setup.
  • $ bin/mahout kmeans—help

If you face trouble compiling the library, shoot an email to mahout-oscon googlegroup. We will try to help you setup the library prior to coming for the tutorial.

QUESTIONS for the speaker?: Use the “Leave a Comment or Question” section at the bottom to address them.

Presentation

Photo of Robin Anil

Robin Anil

Google

Robin is a Committer at the Apache Software Foundation where he works with the Mahout Machine Learning community. He is also a co-author of “Mahout in Action” by Manning Publications, a book on how Mahout is used to perform Machine learning on Terabytes of data with ease.

He used to be a Tech Lead on the ML infrastructure for Minekey Inc, a valley based startup which focussing on recommendations and behavioral targeting for publisher content. He was introduced to the newly born Mahout community through the Google Summer of Code program while he was a dual-degree student at IIT Kharagpur. Since then, he has been trying to model machine learning algorithms in to the Map/Reduce format and have successfully merged his Complementary Naive Bayes and Frequent Pattern Mining implementations with the Mahout code base. He is currently working as a Software Engineer at Google, Bangalore. He finds time from work to contribute actively to the Mahout community.

Photo of Ted Dunning

Ted Dunning

MapR Technologies

Ted Dunning is Chief Application Architect at MapR Technologies and committer and PMC member for the Apache Mahout project. He contributing to the Mahout clustering, classification and matrix decomposition algorithms. He was the chief architect behind the MusicMatch, (now Yahoo Music) and Veoh recommendation systems and built fraud detection systems for ID Analytics.

Comments on this page are now closed.

Comments

Picture of Robin Anil
Robin Anil
07/30/2011 2:25am PDT

Greg, you should use the vectors as input to kmeans and then use the clusterdumper tool to view them in text format. See the slides for reference.

Gregory Altman
07/28/2011 12:59am PDT

Need help: Used Lucene to index documents Run mahout lucene.vector and produced out.vectors and out.dictionary Now I’d like to produce clusters of documents from this in human readable form What mahoot commands should I use and in what sequence? Can you provide an example? Thanks!

Picture of Robin Anil
Robin Anil
07/27/2011 6:11pm PDT

Link to slides goo.gl/XMIDl

Picture of Robin Anil
Robin Anil
07/27/2011 4:00pm PDT

Do download this dataset goo.gl/qv6Ad, wont take more than 2 mins. Sending it early so that pipes dont’ get jammed during the session.

Rick Gordon
07/27/2011 3:06pm PDT

if mvn already installed, its pretty easy – even worked over the conference wifi. looking forward to the session now.

Picture of Robin Anil
Robin Anil
07/25/2011 10:11am PDT

That link didn’t come out right. Attempting again: groups.google.com/group/osc...

Picture of Robin Anil
Robin Anil
07/25/2011 10:10am PDT

The Google Group link is groups.google.com/forum/?hl...!forum/oscon-mahout

Picture of Ted Dunning
Ted Dunning
07/25/2011 8:55am PDT

Andrew,

I doubt that you need all of XCode. Ports should be able to install maven and you should already have java.

Picture of Andrew Serff
Andrew Serff
07/25/2011 8:41am PDT

Next time can you please send the prereq email like a week in advance? I just got it (Monday morning) and now I have to download and install XCode over the OSCON wifi…hopefully it will finish before the tutorial…