The Freedom to Cure Cancer: Open Source Software in Genomics

David Dooling (The Genome Center at Washington University in St. Louis)
Administration, Databases, Linux, Programming
Location: Meeting Room J1/J4
Average rating: ***..
(3.33, 3 ratings)

The Genome Center at Washington University in St. Louis has been at the forefront of genomics since its formation in 1990. Helping to lead the sequencing and analysis of the first multi-cellular organism, C. elegans, and the Human Genome Project, The Genome Center has long leveraged free/libre/open source software (FLOSS). One hallmark of these sequencing efforts has been the rapid dissemination of sequence data for all to download and use. Just as it supports freedom for data, The Genome Center supports freedom for software and its users as well. From it’s IT infrastructure to software development to bioinformatics tools, FLOSS has helped The Genome Center and others make possible the rapid advancements in genomics over the past two decades. The Genome Center has long used a common, customized Debian GNU/Linux build for both its computational cluster and its desktop workstations. It has long relied on an in-house laboratory information management system written in Perl, which includes a custom touchscreen and barcode scanner interface for lab technicians to input data. Other in-house software is written in Perl, PHP, C, C++, and Ruby, all developed on the GNU/Linux platform using free tools, e.g., Emacs, vim, subversion, git, and GCC.

Recent advances in sequencing technologies have led to increases in data generation rates that far outpace Moore’s law and current increases in storage capacity. Over the past few years, sequencers have gone from generating megabases of DNA sequence and megabytes of data per day to gigabases of sequence and terabytes of data per day. While this level of data generation and requisite analysis has severely taxed every aspect of our hardware and software infrastructure, it has also enabled the near routine sequencing of human genomes. These next-generation sequencing platforms have been used to sequence the first Han Chinese and African genomes. While these advancements have increased our understanding of the normal variation between humans and races, the true promise of whole-genome sequencing lies in in application to understanding diseases of the genome such as cancer. To begin to capitalize on this promise, The Genome Center sequenced the first tumor genome from a patient that died of acute myeloid leukemia (AML). DNA from the patient’s skin was also sequenced so that potentially causative mutations could be determined by comparing the tumor and skin sequences. The project involved the daunting task of generating, processing, and analyzing over over 100 TB of data. Since the publication of this milestone, The Genome Center has begun sequencing the genomes of more AML patients and, as part of The Cancer Genome Atlas project, glioblastoma (brain cancer) patients.

This presentation will provide an insiders look at a high-through sequence generation and analysis facility, focusing on how it leverages FLOSS. Specifically, the enterprise IT infrastructure and software architecture will be detailed. The software architecture section will include a description of our freely-available, advanced, enterprise ORM and workflow systems. A brief introduction to cancer genomics will be provided followed by the story of how FLOSS helped to sequence the first cancer genome. Throughout the presentation, past challenges faced in using FLOSS in a leading-edge, enterprise environment and future challenges faced by the recent advances in sequencing technologies will be discussed.

Photo of David Dooling

David Dooling

The Genome Center at Washington University in St. Louis

Dr. Dooling received a B.Ch.E. degree from the University of Dayton (Chemical Engineering, 1995) and a Ph.D. from Northwestern University (Chemical Engineering, 2000) where was the recipient of a Walter P. Murphy Fellowship and Dissertation Year Graham Fellowship. During graduate school he had a internship at UOP developing reaction engineering software. In 2000, he joined ExxonMobil Research and Engineering and developed detailed kinetic models of refinery processes. In 2001, he joined The Genome Center at Washington University in the mapping informatics group developing software to assist in the tracking of clones through the mapping pipeline. In 2002, Dr. Dooling became the Information Systems group leader, overseeing information technology purchases and the system administrators, web administrators, databases administrators, and computer support groups. Dr. Dooling became Assistant Director of The Genome Center in 2006, overseeing the Information Systems, LIMS, and Medical Genomics groups. In 2008, all IT infrastructure and software development was put under his supervision.

  • Intel
  • Microsoft
  • Google
  • SourceForge.net
  • Sun Microsystems
  • Facebook
  • Gear6
  • Kaltura
  • Liferay
  • MindTouch
  • MySpace.com
  • Novell, Inc.
  • Open Invention Network
  • Rackspace Cloud
  • Schooner Information Technology
  • Silicon Mechanics
  • Symbian Foundation
  • Twilio
  • WSO2
  • Yabarana Corporation

Sponsorship Opportunities

For information on exhibition and sponsorship opportunities at the conference, contact Sharon Cordesse at scordesse@oreilly.com

Download the OSCON Sponsor/Exhibitor Prospectus

Media Partner Opportunities

Download the Media & Promotional Partner Brochure (PDF) for information on trade opportunities with O'Reilly conferences or contact mediapartners@ oreilly.com

Press and Media

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

OSCON Newsletter

To stay abreast of conference news and to receive email notification when registration opens, please sign up for the OSCON newsletter (login required)

Contact Us

View a complete list of OSCON contacts