Wikipedia for the iPhone/OLPC: storing the sum of human knowledge in 2GB

Patrick Collison (Stripe)
Mobile
Location: Meeting Room J3
Average rating: ****.
(4.00, 1 rating)

This talk details the technical story behind creating a GPL’d application for storing and reading a copy of Wikipedia on the iPhone and OLPC — basically, realizing The Hitchhiker’s Guide to the Galaxy. Now one of the most popular iPhone apps (with over 100,000 downloads), and a pre-installed OLPC app (bringing Wikipedia to kids who’ve never used the internet), we describe the challenges and hacks involved in making it all work.

To store Wikipedia in 2GB requires finding highly space-efficient compression for article text and search indices. Doing this as part of an interactive application on a device with very limited processing and memory resources introduces further constraints that require unusual solutions.

Some of the hacks we describe include:

  • partial decompression and indexing of bzip2 files;
  • fast prefix and substring matching using a single compressed index, based on James A. Woods’ work (“Finding Files Fast”, USENIX ;login: February 1983), to enable near-instantaneous search of article titles;
  • efficient storage, parsing and rendering of MediaWiki markup, using fairly computationally-intensive preprocessing to enable a rapid, single-pass parser on the device itself.

In addition, we describe some of the systems involved on the server side, such as creating an ad-hoc CDN for distributing our customised dumps.

Photo of Patrick Collison

Patrick Collison

Stripe

Patrick is an Irish Lisp, Smalltalk and C hacker. He won the Irish Young Scientist of the Year award in 2005, for work on a new dialect of Lisp. Later in 2005, he came second in the European Union Contest for Young Scientists. He started college at MIT in 2006, but deferred to cofound Auctomatic in early 2007. Ten months later, Auctomatic was acquired by Live Current Media for circa $5 million. He’s currently working on his second start-up.

Comments on this page are now closed.

Comments

Jared Meeker
07/22/2009 5:55pm PDT

It is refreshing to see creative thinkers at work. Apparently the website has been “slashdotted”, so we’ll have to wait to see the site.

  • Intel
  • Microsoft
  • Google
  • SourceForge.net
  • Sun Microsystems
  • Facebook
  • Gear6
  • Kaltura
  • Liferay
  • MindTouch
  • MySpace.com
  • Novell, Inc.
  • Open Invention Network
  • Rackspace Cloud
  • Schooner Information Technology
  • Silicon Mechanics
  • Symbian Foundation
  • Twilio
  • WSO2
  • Yabarana Corporation

Sponsorship Opportunities

For information on exhibition and sponsorship opportunities at the conference, contact Sharon Cordesse at scordesse@oreilly.com

Download the OSCON Sponsor/Exhibitor Prospectus

Media Partner Opportunities

Download the Media & Promotional Partner Brochure (PDF) for information on trade opportunities with O'Reilly conferences or contact mediapartners@ oreilly.com

Press and Media

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

OSCON Newsletter

To stay abreast of conference news and to receive email notification when registration opens, please sign up for the OSCON newsletter (login required)

Contact Us

View a complete list of OSCON contacts