In this workshop, one of the core MongoDB committers will present the fundamental principles of MongoDB, how to set up and interact with the database, and what to consider when building applications using a document-based data model.
The popularity of NoSQL opens up an endless array of possible uses but also causes its own set of problems. Riak, a NoSQL offering created by Basho solves this by claiming to have no single point of failure. Proving this goes a long way to dispelling the concerns within an enterprise to begin adopting a non-relational solution.
Adding security to an existing product is never easy, but our team at Yahoo added strong authentication to Apache Hadoop by integrating it with Kerberos. This project was delivered on time and is currently deployed on all of Yahoo's 40,000 Hadoop computers. Come learn how we added security to and why it matters.
The data & analytics teams at Etsy build up and tear down more than a thousand independent Hadoop clusters on EC2 each month. This talk discusses the benefits of this approach, where Elastic Map Reduce serves as a "meta-cluster" in which on-demand Hadoop clusters can be created, used, and shut down quickly and easily.
In November, Facebook launched a new version of Messages that combines chat, SMS, email, and Messages into a real-time conversation. Facebook relies on Apache HBase, a NoSQL-style database, for storing this real-time message data. This talk will elaborate on our decision process, system configuration, scaling issues, and advantages gained by choosing Open Source.
Imagine for a moment doing a JOIN on two HBase tables, crazy talk right? Well now you can thanks to Hive. True, it is only meant to be used in a batch context, but we have being doing it for a few months now at StumbleUpon and our analysts and engineers love it. This presentation will cover how the Hive-HBase integration works and how we use it at our company.
Hadoop gives you the ability to process massive amounts of data at scale. This presentation will show you how hadoop makes use of commodity hardware to allow you to build a system that scales, that deals gracefully with failure of individual nodes, and gives you the power of Map/Reduce to process Petabytes.
CouchDB is a document-oriented database that uses JSON documents, has a RESTful HTTP API, and employs map/reduce views for querying data. This tutorial will teach web developers the concepts they need to get started using CouchDB in their projects. Libraries are available for CouchDB’s RESTful HTTP API in many programming languages and we will take a look at some of the more popular ones.
Time Series sensors are being ubiquitously integrated in places like cell phones, environmental sensors, and the smart grid. As we scale out this type of data RDBMS systems strain to scale with the high insertion rates and real time query requirements. In this talk we introduce “Lumberyard” which is a scalable indexing and low latency fuzzy pattern searching time series data.
Over the past few years, Netflix has migrated to the cloud. This talk details Netflix's transition away from relational databases and towards high-availability (NoSQL) storage systems. We rely on a combination of proprietary (e.g. SimpleDB and S3) and open-source (e.g. Cassandra and HBase) NoSQL technologies.
I will overview PNUTS, a large-scale, geographically-replicated serving data store in widespread use at Yahoo! I will introduce key use cases, the main system components, key design decisions, and ongoing work.
The Basho engineering team has been working to make Riak more queryable with the addition of built-in indexing plus a SQL-style query language. In this talk, Rusty describes the usage, benefits, limitations, and evolution of this this functionality, called Secondary Indices. He also covers the challenges and pitfalls of adding indexing to a distributed datastore.
Redis is an entry in the new breed of nosql databases. But it takes a different approach that makes it much more interesting then most of the other key/value stores in the same category. Come learn what makes redis so useful that it seems everyone is adding it to their toolbox.
One of the challenges that comes with moving to MongoDB is figuring how to best model your data. While most developers have internalized the rules of thumb for designing schemas for RDBMSs, these rules don't always apply to MongoDB.
Quick and effective jump start for using Apache Solr, the Lucene-based search server. Solr powers the search and discovery systems of sites such as Zappos, Smithsonian's collections, The Motley Fool, Orbitz, and many many others. This three hour session will give you the basics to immediately begin using Solr on your own data.
Between the NoSQL movement and new cloud offerings, it seems there are new storage options popping up every day. How do you select which one is the best for your project? The truth is that it's unlikely one option is best for all your needs. This session walks you through the various options considered by one startup and how it selected five separate storage engines - and has no regret doing so!
YARN is the next generation of Hadoop Map-Reduce designed to scale out much further while allowing for running applications other than pure Map-Reduce in a highly fault-tolerant manner.