Cloudera's Introduction to Hadoop provides a solid foundation for those seeking to understand large scale data
processing with MapReduce and Hadoop. This session is appropriate for attendees who need to use Hadoop to
analyze data with Hadoop's MapReduce paradigm.
How does Twitter analyze its massive dataset? What tools do we use, and where do we focus our analysis?
In this talk, I will discuss our transition from a MySQL-based to a Hadoop-based data infrastructure and our use of Pig (a scripting language built on top of Hadoop) to democratize big-data analysis across the company. I will present concrete examples of interesting analyses at each step.
Cloudera's Introduction to Hadoop provides a solid foundation for those seeking to understand large scale data
processing with MapReduce and Hadoop. This session is appropriate for attendees who are new to Hadoop and
are seeking to understand where Hadoop is appropriate and how it fits with existing systems.
Data is exploding all over the internet. There is immense knowledge within this huge volume of information that needs to be unlocked. We need to Mine patterns, Find clusters, Organize content and Predict the future. In this talk, we will show what these methods are and how the new Apache Mahout project is attempting to solve these problems in a scalable way by utilizing Hadoop.
A non-classified case study that describes how we've built a stack based on MALLET, Hadoop/Cassandra, and Flare/Flex to build a highly scalable system for the U.S. intelligence community: MALLET lends itself to state of the art NLP, Hadoop/Cassandra yield a massively distributed back end, and Flare/Flex provide the tools for creating a great UI/UX capable of performing advanced analysis.