GKG Network Visualizer

GKG Network Visualizer

Dataset: Global Knowledge Graph

Description: Creates an interactive network diagram that displays in a browser window, a spreadsheet of the most important "influencers" and a .GEXF file for analysis in Gephi.

Components: PERL, sigma.js, Gephi, Gephi Toolkit, R, "igraph".

Acknowledgements: Makes use of a Gephi Toolkit layout/modularity/export tool and Sigma.js visualization written by Josh Melville of the Oxford Internet Institute.

Example: Networking the World's Newsmakers

The GKG Network Visualizer allows you to rapidly construct network diagrams from the GDELT Global Knowledge Graph (GKG), creating interactive browser-based network displays, "centrality" and "influencer" rankings, and even .GEXF files for more sophisticated analysis and visualization using the free open source Gephi suite. No programming or technical skills are required - you simply specify a set of person or organization names, locations, or Global Knowledge Graph Themes, along with an optional date range, and the system will automatically search the entire Global Knowledge Graph for all matching entries and construct a network diagram matching your search criteria. Your results will be emailed to you when complete, usually within 10 minutes, depending on server load and the time it takes to perform layout, centrality, and community analysis.

All GDELT Global Knowledge Graph records are scanned for your search parameters and a list of all people/organizations/locations/theme (depending on what you select below) is compiled as the nodes of the network and the number of times that any two names co-occur together is used as the strength of the connection between them. Thus this network diagram measures "media contextualization", or the degree to which the global news media refers to two names together over time.

Your Email Address

Creating these results can take several minutes depending on server demand - please provide the email address that the results should be sent to.

Email Address

Date Range

Limit the time period of analysis. The earliest allowable date for the Global Knowledge graph is currently April 1, 2013 and the latest date allowed is the current day.

Start Date
End Date
 

Keyword Search Criteria

You must specify a set of keywords that will be used to search the Global Knowledge Graph for matching records. Separate multiple terms with commas. The three fields are boolean AND'd together, so to search for discussion of Food or Water Security in Nigeria and to exclude any mentions of US President Obama or Edward Snowden, you would enter "Nigeria" in the first field, "WATER_SECURITY, FOOD_SECURITY" in the second, and "Barack Obama, Edward Snowden" in the third. Fields are not case sensitive.

All GKG fields are searched for these keywords, so you can use a combination of person and organization names, countries and cities, and GKG Themes. NOTE that this does NOT search article fulltext, only the extracted GKG fields.

Include ALL OF

Include AT LEAST ONE OF

Must NOT Have ANY OF

Node Field

Which field should be used to construct the nodes of the network? A list of the unique values of this field will be computed and used as the nodes of the network diagram.

  • Person Names Network of all of the people mentioned in articles matching your search criteria and their co-occurances. No name normalization is performed, so you may see multiple spellings or transliterations of a given name.
  • Organization Names Network of all of the organizations mentioned in articles matching your search criteria and their co-occurances. The algorithm used by GDELT to recognize organization names is specifically tuned to err on the side of inclusion in order to capture previously unknown organizations and smaller advisory councils and organizations throughout the world. It therefore has a much higher false positive rate than person names and will include multiple variants of an organizations name as well as generic names such as "city council". Both non-profit and commercial enterprises are included in this field.
  • GKG Themes Network of all of the GKG Themes mentioned in articles matching your search criteria and their co-occurances. Themes tend to cooccur frequently and thus you will want to increase the Cutoff Thresholds in the next section in order to reduce the number of connections in the network and surface its underlying structure.
  • Country Names Network of all of the country names mentioned in articles matching your search criteria and their co-occurances.
  • Cities and Administrative Divisions Network of all of the cities and first order administrative divisions (roughly equivalent to a US state) mentioned in articles matching your search criteria and their co-occurances. No name normalization is performed and thus multiple transliterations of a city's name will result in multiple entries in this field. See the technical details on the contents of this field - it matches all GNS and GNIS entries.

Node/Edge Weighting

How should the popularity of nodes and the "strength" of the connection between nodes be measured?

  • Number Namesets As the GDELT Global Knowledge Graph processes each news article it extracts a list of all people, organizations, locations, and themes from that article and concatenates them together to form a unique "key" that represents that particular combination of names, locations, and themes. All articles containing that same unique combination of names, locations, and themes, regardless of how similar the rest of the text is, are grouped together into a "nameset". Selecting this edge weighting option means that the "Node Cutoff" option below determines how many unique namesets a given name/theme/location must occur in before it is counted, while the "Edge Cutoff" similarly refers to the number of unique namesets that a pair of names/themes/locations must appear together in before that edge is counted. This option essentially weights nodes and edges towards those that occur in the greatest diversity of contexts, biasing towards public figures and those who occur frequently with many other people. It is relatively immune to sudden massive bursts of coverage that only lasts a day or two (such as from a major sudden situation) and instead tends to capture the broadest trends in the network.
  • Number Articles This option bases the weights on the raw number of articles a given name/theme/location occurs in or co-occurs with another name/theme/location in. Selecting this edge weighting option means that the "Node Cutoff" option below determines how many total articles a given name/theme/location must occur in before it is counted, while the "Edge Cutoff" similarly refers to the number of articles that a pair of names/themes/locations must appear together in before that edge is counted. This option essentially weights nodes and edges towards those that occur the most frequently, even if they always occur with the same set of names, biasing towards frequency rather than uniqueness. It can be highly sensitive to sudden massive bursts of coverage that only lasts a day or two (such as from a major sudden situation) and so should be used with care, but can yield a more nuanced and detailed picture of a network.

Cutoff Thresholds

If your network ends up being too dense (too many nodes or edges such that you can't see anything) or too sparse (too few nodes and edges such that you can't see any patterns), you may decide to adjust the cutoff thresholds below. Node Cutoff sets how many times a name must appear before it is included in the graph, while Edge Cutoff sets how many times a pair of names must appear together before they are connected in the network. The counts measured by these cutoffs are affected by your selection in the Edge Weighting section above.

Node Cutoff
Edge Cutoff
 

Outputs

What output files would you like generated?

  • Interactive Browser Visualization Generates a ready-to-go network visualization using Gephi, laying the nodes out using Force Atlas 2, and grouping them into communities using Modularity Finding. It generates a Sigma.js visualization that displays the final network in your browser.
  • Gephi .GEXF File This generates a .GEXF network file that you can import directly into the Gephi package for more advanced analysis and visualization.
  • Centrality Spreadsheet Calculates a range of importance scores for each of the names in the network, helping you to identify key influencers. This option also creates a copy of the network in the .NET Pajek file format suitable for loading into a vast array of packages such as Pajek, visone, Ucinet, Ora, NetworkX, Snap, Tulip, etc, along with tools like the R "igraph" package for further statistical analysis. See the igraph package for more information about each of the centrality measures in the spreadsheet, how they are calculated, and what they mean.