The Global Database of Events, Language, and Tone (GDELT) is an initiative to construct a catalog of human societal-scale behavior and beliefs across all countries of the world over the last two centuries down to the city level. Containing more than a quarter billion events captured since 1979, the GDELT data set represents a significant effort to record, quantify, and analyze these critical events.
The term SQL on Hadoop references a number of different technologies all with the intent to allow for SQL query access to data sitting within a Hadoop cluster. Hadoop has become a go-to technology to deliver cheap, redundant storage (HDFS) as well as a distributed job structure (map-reduce) that allows for robust job execution against an arbitrarily large cluster. However, Hadoop offers limited capability for real-time ad-hoc analytics or to tie into many SQL based reporting tools, a gap these emerging SQL on Hadoop technologies seek to fill.
Working with the GDELT data set has challenges common to many data applications. Data quality issues and data skew across different dimensions such as time and location hinder access to the underlying data patterns. Work through the data readiness challenges using a combined Hadoop and SQL-for-Hadoop architecture to prepare the data, and then leverage the performance capabilities of InfiniDB for Hadoop to explore the data and deliver analytic insights.
Jim has extensive experience in leading the development, management, and performance for enterprise data architectures, including clustered, large SMP, and distributed systems for the retail, web, and telecom industries. He is responsible for the architecture, vision, direction, and technical evangelization of InfiniDB. Jim holds a BBA from Texas A&M and a Masters in Management Information Systems from the University of Texas at Dallas.
Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?
Join the conversation here (requires login)
For exhibition and sponsorship opportunities, contact Sharon Cordesse at firstname.lastname@example.org
For information on trade opportunities with O'Reilly conferences contact email@example.com
For media-related inquiries, contact Maureen Jennings at firstname.lastname@example.org
View a complete list of OSCON contacts