THIS TUTORIAL HAS REQUIREMENTS AND INSTRUCTIONS LISTED BELOW
In this tutorial, we focus on what’s missing from the documentation and will not assume you’re already a Lucene expert. Diving behind the curtain, we explore the data structures used for indexing, the algorithms that make faceting so fast, and the tradeoffs involved in replication and sharding. From these fundamentals, you will be able to draw reasonable conclusions about how to make your own use cases efficient. FInally, we show how to avoid the mistakes we made, both in design and deployment, so you can build a stable cluster in days rather than months.
Elasticsearch started as a one-man show.
Docs look good at first, but you quickly realize there’s a lot missing.
Characteristics of ES as a datastore
Definition of terms
Basic data structure
Nesting and inter-document relationships
Filters vs. Queries
Filters are cached, so filter when you can.
Queries are more powerful: fuzzy stuff, scoring, etc.
Term vs. match and why this will save you days of pain
Text phrase queries
Building glorious towers of boolean logic and why EC2 will make you sorry
Exercises (scattered throughout this section):
Mappings and Analysis
What a mapping is and what it is not
4-fold path to analysis
Parallels with DB indexing
What kinds of analyses are there?
Choosing appropriate analysis: what kinds speed which queries?
Query analyzers (vs. index analyzers)
Shrinking your index
An example ES integration
How to index
Libraries: what to use in some popular languages (Python, PHP, Ruby)
What to do with ES query results
Fancy/advanced features (not covered in depth, and may be omitted for time, but have slides)
Autocompletion – via prefixing, via autocomplete suggester (beta)
Deployment and Administration
Don’t trust new versions too readily. It moves fast but furiously.
Give it big RAM up front.
All those lovely Java tuning switches: not necessary
Use an up-to-date JVM and a modern OS. Difference between life and death.
Deploying new mappings and synonyms without moving files around
Planning for the future
Mergeable and unmergeable changes
ES isn’t a good primary store, in most cases, because of the brittleness of mappings.
However, the update API exists, and versioning dodges race conditions.
TUTORIAL REQUIREMENTS AND INSTRUCTIONS FOR ATTENDEES
* Attendees need to install ES 1.2.x, and have a way to send HTTP POST requests, e.g. curl.
QUESTIONS for the speaker?: Use the “Leave a Comment or Question” section at the bottom to address them.
Erik Rose leads Mozilla’s DXR project, which does regex searches and static analysis on large codebases like Firefox. He is an Elasticsearch veteran, maintaining the pyelasticsearch library, transitioning Mozilla’s support knowledgebase to ES, and building a burly cluster to do realtime fuzzy matching against the entire corpus of U.S. voters. Erik is a frequent speaker at conferences around the world, the author of “Plone 3 for Education”, and the nexus of the kind of animal magnetism that comes only from writing your own bio.
Laura Thomson is a Senior Engineering Manager at Mozilla Corporation. She works with the Web Engineering team, which is responsible for the Firefox crash reporting system and other developer tools, and the Release Engineering team, which is responsible for shipping Firefox.
Laura is the co-author of “PHP and MySQL Web Development” and “MySQL Tutorial”. She is a veteran speaker at Open Source conferences worldwide.
Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?
Join the conversation here (requires login)
For exhibition and sponsorship opportunities, contact Sharon Cordesse at firstname.lastname@example.org
For information on trade opportunities with O'Reilly conferences contact email@example.com
For media-related inquiries, contact Maureen Jennings at firstname.lastname@example.org
View a complete list of OSCON contacts