Sponsors

  • 10gen
  • DataStax, Inc.
  • Dell
  • Google
  • Lexis Nexis
  • Oracle
  • VMware
  • Percona

Sponsorship Opportunities

For information on exhibition and sponsorship opportunities at the convention, contact Sharon Cordesse at scordesse@oreilly.com

Download the OSCON Data Sponsor/Exhibitor Prospectus

Media Partner Opportunities

For information on trade opportunities with O'Reilly conferences or contact mediapartners@ oreilly.com

Press and Media

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

OSCON Bulletin

To stay abreast of convention news and announcements, please sign up for the OSCON email bulletin (login required)

Contact Us

View a complete list of OSCON contacts

The Hitchhiker’s Guide to A Kaggle Competition

Data: Roulette
Location: Oregon Ballroom 203
Average rating: ***..
(3.00, 3 ratings)

An introductory hands-on workshop, aimed at the Amateur Data Scientists among us, to the Heritage Health Prize competition. First, we will quickly look at the classes of algorithms & what they do through competition problems & datasets. Next we will dig deeper into one completion the Kaggle RTA Challenge(Ensemble/Random Forest). We will then dive into the Heritage Health Prize, work through the dataset & submit an entry!

Note: While there is not enough time for the participants to work through the different datasets, we will provide links to a hands-on tutorial which you’all can do after the workshop.

Outline:

  • Algorithms for the Amateur Data Scientist
    • A look at the broader algorithms leading to Trees & Random Forests
  • The Art of Analytics Competitions – The Kaggle challenges
  • Anatomy of a competition – How the RTA was won
    • Predicting traffic at RTA using Ensemble /Random Forest Trees
  • Competition in flight – The HHP
    • Dataset Organization
    • Analytics Walkthrough
    • Submit our entry
  • Conclusion
Photo of Krishna Sankar

Krishna Sankar

TCS

Krishna Sankar is currently a Lead Software Engineer/Data Scientist at genophen.com developing scientific/consumer bioinformatics systems based on AWS, MongoDB & HKrishna Sankar is currently a Principal Architect/Data Scientist with the NextGen Big Data group at Tata Consultancy Services . Prior to this he was Director of Engg/Data Science at a startup, working on bioinformatics/consumer applications in AWS. He also has worked at Egnyte as a Lead Architect, developing cloud object store layer (handling billions of files/petabytes of storage) and security (federated Identity/SSO); and before that he was at Cisco as a Distinguished Engineer, lastly working on various aspects of Big Data and Cloud Computing. Krishna’s recent speaking engagements include OSCON 2012 Social Media Analysis with Twitter[http://goo.gl/mFflw], OSCON 2011 –Hitchhiker’s Guide to Kaggle[http://goo.gl/75X7w] & OSCON 2010 [http://goo.gl/8Ukiw] as well as guest lecturing at the Naval Postgraduate School on Big data [http://goo.gl/2pBYS]. His interests include big data stacks – from infrastructure to visualization, highly scalable cloud architectures & intelligent inferences. In his spare time, he is pursuing the Mining Massive Data Sets Graduate Certificate at Stanford. He also writes books – including “Cisco Wireless LAN Security” and “Enterprise Web 2.0”. His other passion is Lego Robotics and is contributing as Technical Judge in local & Lego world competitions.adoop. Prior to this, he was a Lead Architect at egnyte.com, developing cloud object store layer (handling billions of files/petabytes of storage) and security (federated Identity/SSO); and before that he was at Cisco as a Distinguished Engineer, lastly working on various aspects of Big Data & Cloud Computing. His latest RFC 6208 is on cloud storage & CDMI. He been developing systems for the last 30+ years – from C/CPM to Cobol to Ada to Java to … His interests include big data stacks – from infrastructure to visualization, highly scalable cloud architectures & intelligent inferences. He is pursuing the Mining Massive Data Sets Graduate Certificate at Stanford. He also writes books – including “Cisco Wireless LAN Security” and “Enterprise Web 2.0”. His other passion is Lego Robotics and is contributing as Technical Judge in local & Lego world competitions.

Comments on this page are now closed.

Comments

Picture of Krishna Sankar
07/27/2011 5:01pm PDT
There was a question from today’s workshop about good books on algorithms. The best list I have seen are answers at Quora and one at Linkedin:
Picture of Krishna Sankar
07/25/2011 3:59pm PDT

I have downloaded a WIP snapshot at www.slideshare.net/ksankar/.... WOuld appreciate any comments. Beware – I have too many slides, it is intentional.