For information on exhibition and sponsorship opportunities at the convention, contact Sharon Cordesse at scordesse@oreilly.com
Download the OSCON Data Sponsor/Exhibitor Prospectus
For information on trade opportunities with O'Reilly conferences or contact mediapartners@ oreilly.com
For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com
To stay abreast of convention news and announcements, please sign up for the OSCON email bulletin (login required)
View a complete list of OSCON contacts
Attendee prerequisites for this tutorial are listed below.
Mahout is an open source machine learning library from Apache. At the present stage of development, it is evolving with a focus on collaborative filtering/recommendation engines, clustering, and classification.
There is no user interface, or a pre-packaged distributable server or installer. It is, at best, a framework of tools intend to be used and adapted by developers. The algorithms in this “suite” can be used in applications ranging from recommendation engines for movie websites to designing early warning systems in credit risk engines supporting the cards industry out there.
This tutorial aims at helping you set up Mahout to run on a Hadoop setup. The instructor will walk you through the basic idea behind each of the algorithms. Having done that, we’ll take a look at how it can be run on some of the large-sized datasets and how it can be used to solve real world problems.
If your site or smartphone app or viral facebook app collects data which you really want to use a lot more productively, this session is for you!
PREREQUISITES
Instructions for setting up Mahout
First, subscribe to mahout-oscon googlegroup for updates, announcements and for discussing issues with setting up mahout for the tutorial.
Platforms supported by MahoutIf you face trouble compiling the library, shoot an email to mahout-oscon googlegroup. We will try to help you setup the library prior to coming for the tutorial.
QUESTIONS for the speaker?: Use the “Leave a Comment or Question” section at the bottom to address them.
Robin is a Committer at the Apache Software Foundation where he works with the Mahout Machine Learning community. He is also a co-author of “Mahout in Action” by Manning Publications, a book on how Mahout is used to perform Machine learning on Terabytes of data with ease.
He used to be a Tech Lead on the ML infrastructure for Minekey Inc, a valley based startup which focussing on recommendations and behavioral targeting for publisher content. He was introduced to the newly born Mahout community through the Google Summer of Code program while he was a dual-degree student at IIT Kharagpur. Since then, he has been trying to model machine learning algorithms in to the Map/Reduce format and have successfully merged his Complementary Naive Bayes and Frequent Pattern Mining implementations with the Mahout code base. He is currently working as a Software Engineer at Google, Bangalore. He finds time from work to contribute actively to the Mahout community.
Ted Dunning is Chief Application Architect at MapR Technologies and committer and PMC member for the Apache Mahout project. He contributing to the Mahout clustering, classification and matrix decomposition algorithms. He was the chief architect behind the MusicMatch, (now Yahoo Music) and Veoh recommendation systems and built fraud detection systems for ID Analytics.
Comments on this page are now closed.
Comments
Greg, you should use the vectors as input to kmeans and then use the clusterdumper tool to view them in text format. See the slides for reference.
Need help: Used Lucene to index documents Run mahout lucene.vector and produced out.vectors and out.dictionary Now I’d like to produce clusters of documents from this in human readable form What mahoot commands should I use and in what sequence? Can you provide an example? Thanks!
Link to slides goo.gl/XMIDl
Do download this dataset goo.gl/qv6Ad, wont take more than 2 mins. Sending it early so that pipes dont’ get jammed during the session.
if mvn already installed, its pretty easy – even worked over the conference wifi. looking forward to the session now.
That link didn’t come out right. Attempting again: groups.google.com/group/osc...
The Google Group link is groups.google.com/forum/?hl...!forum/oscon-mahout
Andrew,
I doubt that you need all of XCode. Ports should be able to install maven and you should already have java.
Next time can you please send the prereq email like a week in advance? I just got it (Monday morning) and now I have to download and install XCode over the OSCON wifi…hopefully it will finish before the tutorial…