For information on exhibition and sponsorship opportunities at the conference, contact Sharon Cordesse at firstname.lastname@example.org.
For media-related inquiries, contact Maureen Jennings at email@example.com.
To stay abreast of conference news and to receive email notification when registration opens, please sign up for the OSCON newsletter (login required).
View a complete list of OSCON 2008 Contacts
This presentation will introduce people to bigdata—a scale-out database and computing platform. Unlike either Hadoop’s or Google’s approach, bigdata begins with a distributed index architecture and derives a high concurrency row store, a high performance semantic web database, a generic object database, and a distributed file system with atomic append from some basic operations on those indices. Thompson will introduce the high-level architecture, show how they derived the various services from range-partitioned indices, discuss how and where scale-out indices and map/reduce computing can be combined, and present in some depth on the scale-out semantic web database including some performance and scaling data.
This presentation will be technical. People should have an awareness of cloud computing and/or a familiarity with the semantic web.
This presentation is important because cloud (or grid, or scale-out) computing will increasingly provide the infrastructure for emerging businesses. Open source platforms for cloud computing are vital as they bring enabling technology to more people and enable businesses by keeping down the cost of scaling out.
This presentation will be interesting to architects and developers who want to explore cloud computing, to people developing scale-out infrastructure, such as Hadoop or CouchDB, and to businesses interested in open source platforms for scale-out computing. bigdata is especially of interest for the semantic web / Web 3.0 space—there are no generally available scale-out semantic web databases available today.
bigdata is a 100% Java project providing scale-out (distributed) indices, map/reduce style computing, a sparse row store (ala Hadoop’s HBase, Google’s bigtable, or CouchDB) a distributed file system (ala Hadoop’s HDFS or Google’s GFS), a high performance RDF database, and a flexible object generic object model (GOM) database.
The basic building blocks for the bigdata architecture are scale-out indices, data services (hosting index partitions), and metadata services (locators for data services). The scale-out indices are B+Trees and remain balanced under insert and removal operations. The B+Tree defines a mapping from/to variable length bytes (the keys are interpreted as unsigned bytes) and structure is imposed on those keys and values by the application. Indices are transparently range-partitioned and distributed across a cluster or grid of commodity servers. Service failover and high availability are handled by redundent service registrations. Rather than storing index data in a distributed file system, data is stored on local disk on each machine hosting a data service. In fact, the distributed file system itself is just an application of the scale-out index service. Data failover is handled by replicating data using streaming writes to secondary services. The services layer uses Jini for service registration and discovery, but SCA and OSGi integrations are being considered.
Mr. Thompson is the Chief Scientist and a co-founder of SYSTAP, LLC. SYSTAP is a boutique software consultancy focused on providing custom technology services to the federal government and private sector. SYSTAP provides solutions that bridge the gap between real-world, mission-critical customer problems and innovative research, emerging technologies, and open-source software. His work for the last several years has been focused on assessing and applying Semantic Web technologies to support semantics-based federation (mashups) at scale (billions of triples). Mr. Thompson is the founder of the bigdata open source project, which is developing a scale-out database and computing fabric. He is also the founder of the CognitiveWeb – an open source project whose goal is to is to extend human decision horizons by compensating for some intrinsic aspects of selective attention – basically helping people to bridge their separate areas of expertise. He was an active member in the jdbm project for several years, and developed the extensible serialization mechanism used by that project.