Visualization techniques

Visual Analytics for Linguists

ESSLLI 2014

Chris Culy

revised 7 August 2014

A note on programming languages

  • Much of the interesting work in data visualization is on the web in Javascript
    This is what I will use for illustrations
  • Other languages are possible, especially R, Python, and Java

Sources of javascript examples

Charts

  • Charts are basic, but important
  • Lots of tools to make charts. Some of them are interactive.
  • An interesting comparison of chart types with the same data is here

Time-varying data: Time series

Data (typically numeric) which varies over time, e.g. frequencies in time-annotated corpus, e.g. Google Ngram Viewer

Time-varying data: Timelines

Timelines are typically used for events, rather than numeric data

Maps

Lots of things you can do with them, but they can be difficult to get right.

  • Google Maps, Open Street Map, etc.
  • GIS systems are an alternative, e.g. CartoDB (there is a free version)
  • For DIY, D3 has some mapping capabilities and more specialized Polymaps

A note about graphs

Graphs are a very popular construct, especially for (social) networks. However, we can make graphs out of many types of data. The key is to decide what information will correspond to nodes and what information to edges. Typically:

  • Nodes correspond to entities ("things", or "objects")
  • Links correspond to relationships between the entities

For example, if we have word co-occurrence information at the sentence level, we would let nodes be the words, and two nodes would be linked if they co-occur in the same sentence.

Graphs are abstract mathematical objects! They can be visualized in a variety of ways.

Graphs/Networks (non-hierarchical)

Lots of ways to represent graphs/networks. Some algorithms are slow for large amounts of data, so be careful.

  • Demo of 3 different graph visualizations
  • Comparison of those same types (in Protovis), Force-directed node-link (Advantage: does clustering automatically) Arc diagram Matrix diagram
  • Lots of tools to do force-directed layouts, e.g. D3, theJIT for javascript. Gephi is Java champion
  • GraphViz has several different algorithms, also for hierarchical graphs. Harder to integrate
  • Dagre: directed graphs in javascript (optionally with D3)

Hierarchical data (trees as lines)

Lots and lots of ways to represent hierarchical data. Trees are one type of hierarchical data.

Hierarchical data (trees as area)

Multi-dimensional data

Miscellaneous

Some considerations for LangVis

cf IBM's Many Eyes as an example of these

  • Ability to see the original data
  • Ability to navigate backwards and forwards between states (not common)
  • Ability to link to a state (not common)
  • Ability to annotate the visualization (very rare)

Charts with language / linguistic data

Corpus with facets

Dictionaries and related

  • Visuwords WordNet (there are other similar ones)
  • Docuburst by Chris Collins et al. Documents + WordNet
  • Dictionary definitions over time by Theron and Fontanillo
  • [Demo] Verweis Viewer: link relations in a terminology database. C. Culy, E. Chiocchetti, and N. Ralli. "Visualizing conceptual relations in legal terminology" in Proceedings 2013 17th International Conference on Information Visualization IV 2013, July 16-18, 2013, London, UK. 333-338

Visualizations by CuC

Source

For some more information see the infovis-wiki