An overview of the state of the art for bringing together the analytical power of the R language with the big data capabilities of Hadoop.
You've heard about NoSQL. You've heard about the Cloud. What if you could spin up something like HBase in a couple minutes and try out both at the same time. By the end of this session, you'll learn how to do just that, in a way portable across several NoSQL projects and dozens of compute clouds.
Case study in using open data and open source systems to enable research in personalized medicine. Will show how we leverage publicly available data along with clinical and experimental data from collaborators in 5 different countries to advance disease detection and personalized medicine.
The Basho engineering team has been working to make Riak more queryable with the addition of built-in indexing plus a SQL-style query language. In this talk, Rusty describes the usage, benefits, limitations, and evolution of this this functionality, called Secondary Indices. He also covers the challenges and pitfalls of adding indexing to a distributed datastore.
The last few years have brought a wealth of new data technologies organized around horizontal scalability. This talk will cover the essential infrastructure areas: real-time stream processing, offline data crunching, large-scale data deployments and live serving. The focus will be on how these ingredients come together to enable innovative data-driven products at LinkedIn.