For information on exhibition and sponsorship opportunities at the conference, contact Sharon Cordesse at email@example.com.
For media-related inquiries, contact Maureen Jennings at firstname.lastname@example.org.
To stay abreast of conference news and to receive email notification when registration opens, please sign up for the OSCON newsletter (login required).
View a complete list of OSCON 2008 Contacts
The Internet Archive, with support from other libraries around the world, has helped develop a collection of open source tools in Java to support web archiving. These include the Heritrix archival web crawler, “Wayback” for replaying historic web content, and extensions to Nutch for web archive full-text search. This session will explain the design and capabilities these tools, and quickly demo their use for the creation of a small personal web archive.
Heritrix has been designed for faithful and complete content archiving but has also found use in other web search contexts. Wayback allows URL-based lookup and follow-up browsing of archived web content. Nutch, as applied to archival web crawls, allows Google-style full-text search of web content, including the same content as it changes over time. Together, they provide everything necessary to archive and access accurate historical records of web-published content.
Gordon Mohr leads software development for the Internet Archive’s public and open source web archiving projects, including the Heritrix web crawler, Nutch-based archive text search engine, and Wayback Machine archive browser.
Before joining the Internet Archive, Gordon helped create other innovative applications for the Internet, including Bitzi Bitpedia, a collaborative digital media encyclopedia, Activerse Ding, an instant-messaging platform, and ParcPlace VisualWave, an early web application server and development environment.
Gordon has a BA from the University of California, Berkeley with a double-major in Computer Science and Economics.