Skip to main content

Predicting Global Unrest with GDELT and SQL on Hadoop

Jim Tommaney (InfiniDB)
Average rating: ***..
(3.00, 1 rating)
Slides:   1-PDF 

The Global Database of Events, Language, and Tone (GDELT) is an initiative to construct a catalog of human societal-scale behavior and beliefs across all countries of the world over the last two centuries down to the city level. Containing more than a quarter billion events captured since 1979, the GDELT data set represents a significant effort to record, quantify, and analyze these critical events.

The term SQL on Hadoop references a number of different technologies all with the intent to allow for SQL query access to data sitting within a Hadoop cluster. Hadoop has become a go-to technology to deliver cheap, redundant storage (HDFS) as well as a distributed job structure (map-reduce) that allows for robust job execution against an arbitrarily large cluster. However, Hadoop offers limited capability for real-time ad-hoc analytics or to tie into many SQL based reporting tools, a gap these emerging SQL on Hadoop technologies seek to fill.

Working with the GDELT data set has challenges common to many data applications. Data quality issues and data skew across different dimensions such as time and location hinder access to the underlying data patterns. Work through the data readiness challenges using a combined Hadoop and SQL-for-Hadoop architecture to prepare the data, and then leverage the performance capabilities of InfiniDB for Hadoop to explore the data and deliver analytic insights.

Highlights:

  • Learn about the GDELT data set.
  • Understand appropriate workloads for a combined Hadoop + SQL-for-Hadoop environment.
  • Explore the capabilities of SQL-for-Hadoop to deliver data insights.
Photo of Jim Tommaney

Jim Tommaney

InfiniDB

Jim has extensive experience in leading the development, management, and performance for enterprise data architectures, including clustered, large SMP, and distributed systems for the retail, web, and telecom industries. He is responsible for the architecture, vision, direction, and technical evangelization of InfiniDB. Jim holds a BBA from Texas A&M and a Masters in Management Information Systems from the University of Texas at Dallas.