Ever had to dig into a system that misused the most basic features of a RDBMS ? Better yet - after the whole NoSQL storm had you wondered why it didn't shown before when you had to twist your schema to fit into something it was not designed for ? Check on this anti-patterns collection and feel better that you are not alone - and how you can benefit from it even not having big data around.
A look at the state of data storage, management & analysis, from SQL
to NOSQL, “NewSQL” and beyond. I will explain why the core premises of
data management have changed; tell some of the tales of success and failure I have collected on the topic; share some
counterintuitive rules-of-thumb about the sometimes mind-blowing,
sometimes nerve-wrecking reality of life with an alternative
The popularity of NoSQL opens up an endless array of possible uses but also causes its own set of problems. Riak, a NoSQL offering created by Basho solves this by claiming to have no single point of failure. Proving this goes a long way to dispelling the concerns within an enterprise to begin adopting a non-relational solution.
You've heard about NoSQL. You've heard about the Cloud. What if you could spin up something like HBase in a couple minutes and try out both at the same time. By the end of this session, you'll learn how to do just that, in a way portable across several NoSQL projects and dozens of compute clouds.
Those who cannot remember the past are condemned to repeat it. This is part survey, part critique of the various Atomicity, Consistency, Isolation and Durability models available from various modern databases and data stores used in modern Web and Cloud environments.
In November, Facebook launched a new version of Messages that combines chat, SMS, email, and Messages into a real-time conversation. Facebook relies on Apache HBase, a NoSQL-style database, for storing this real-time message data. This talk will elaborate on our decision process, system configuration, scaling issues, and advantages gained by choosing Open Source.
Google App Engine is an application development and cloud-hosting platform that lets users create apps to run Google's datacenters. In this 3-part tutorial, we'll give a 1-hour intro talk on cloud computing and App Engine, a 90-100 minute introductory codelab to get your feet wet with App Engine development, and finally conclude with about a half-hour intro to some of App Engine's newest features!
The story of the development team and what lessons we learned in building Open Legislation - an open government platform. It will detail our transition from a MySQL back end to an application fully powered by Lucene, the data quality and efficiency issues that we’ve had to address, and how we’re now trying to rebuild internal trust after our iterative and initially shaky development process.
With most modern web applications, there are requirements for both SQL access to complex data as well as simple Key-Value look-ups. This session will cover how to use the HandlerSocket Plug-In for MySQL to get exponentially faster look-ups for simple access patterns.
Imagine for a moment doing a JOIN on two HBase tables, crazy talk right? Well now you can thanks to Hive. True, it is only meant to be used in a batch context, but we have being doing it for a few months now at StumbleUpon and our analysts and engineers love it. This presentation will cover how the Hive-HBase integration works and how we use it at our company.
CouchDB is a document-oriented database that uses JSON documents, has a RESTful HTTP API, and employs map/reduce views for querying data. This tutorial will teach web developers the concepts they need to get started using CouchDB in their projects. Libraries are available for CouchDB’s RESTful HTTP API in many programming languages and we will take a look at some of the more popular ones.
If you've ever had to move from data center to data center or to the cloud, or from old hardware to new hardware, you know that it's even more painful than moving house. In this presentation, survivors will tell you how to stay sane (and how to get it right) with a case study from Mozilla: moving 30TB of crash reports with no downtime in data collection.
We at DeNA (largest social game provider in Japan) handle over 2
billion page views per day with MySQL. We heavily use SSD and tune
Linux. We run non-trivial solutions such as non-stop, automated MySQL
master failover. We also use MySQL not only as traditional RDBMS but
also an extremely high performance NoSQL. I'd like to introduce our
MySQL solutions to make our social games scale better.
Location-based services are hot, but geographic datasets are complex. But this shouldn’t put you off writing awesome location-aware services. This talk will show how to create spatial models and query the Open Street Map dataset together with social data using the Neo4j graph database.
Over the past few years, Netflix has migrated to the cloud. This talk details Netflix's transition away from relational databases and towards high-availability (NoSQL) storage systems. We rely on a combination of proprietary (e.g. SimpleDB and S3) and open-source (e.g. Cassandra and HBase) NoSQL technologies.
This panel discussion features the key innovators in the NoSQL space.
OpenTSDB is an open-source, distributed time series database designed to monitor large clusters of commodity machines at an unprecedented level of granularity. OpenTSDB enables operations teams to keep track in real-time of all the metrics exposed by operating systems, applications and network equipment, and makes the data easily accessible.
I will overview PNUTS, a large-scale, geographically-replicated serving data store in widespread use at Yahoo! I will introduce key use cases, the main system components, key design decisions, and ongoing work.
Covers the benefits and drawbacks of using NoSQL databases. Uses a use case from the book POJOS in Action to compare and contrast popular NoSQL databases – Redis, SimpleDB, MongoDB, and Cassandra.
popHealth is an open source tool that allows healthcare providers to calculate quality measures. A quality measure is a calculation of the number of individuals in a population that meet a specific standard of care. This ONC sponsored effort integrates with electronic health record systems using standards based patient summary documents to calculate and report on quality measures.
The Basho engineering team has been working to make Riak more queryable with the addition of built-in indexing plus a SQL-style query language. In this talk, Rusty describes the usage, benefits, limitations, and evolution of this this functionality, called Secondary Indices. He also covers the challenges and pitfalls of adding indexing to a distributed datastore.
Ruby on Rails is a great framework for quickly building applications, but what happens when you are wildly successful and need to scale WAY up? This talk is a case study in the evolution of our Rails application from a monolithic "does everything" systems running on a hosted server to a service-oriented system running in the cloud.
Redis is an entry in the new breed of nosql databases. But it takes a different approach that makes it much more interesting then most of the other key/value stores in the same category. Come learn what makes redis so useful that it seems everyone is adding it to their toolbox.
The last few years have brought a wealth of new data technologies organized around horizontal scalability. This talk will cover the essential infrastructure areas: real-time stream processing, offline data crunching, large-scale data deployments and live serving. The focus will be on how these ingredients come together to enable innovative data-driven products at LinkedIn.