Natural Language Processing for Text Visualization

Daniela Oelke
Saeedeh Momtazi
Daniel A. Keim


Large amounts of information are not available in a structured form but as text. Automatic text analysis techniques are thus an important means to deal with this kind of data. However, due to the impressive flexibility and complexity of natural language, automatic techniques get to a limit where the analysis questions require background knowledge or a thorough understanding of the semantics. Consequently, there is a growing interest for visual analytics techniques that use visualization methods to incorporate the user into the process, thereby helping to bridge this semantic gap.

In contrast to the visualization of structured data, text data cannot be visualized directly but requires at least a preprocessing step with suitable automatic techniques. Complex tasks require a tight connection between the automatic and the visual techniques as suggested by the visual analytics pipeline. However, the effectiveness of such systems highly depends on the visualization researcher’s capability to make an informed choice on which text analysis algorithms to choose and how to integrate them into the visual analytics system.

The goal of this tutorial is to equip interested researchers of the vis community with the necessary competencies in natural language processing (NLP). We will first introduce the basic concepts and techniques of automatic text processing which include stemming, part-of-speech tagging, parsing, topic modeling, concept representation, information extraction etc. In a second part we will inform about state-of-the-art NLP components that are freely available in the NLP research community and can be leveraged for the development of powerful visual analytics tools. The tutorial will conclude with application examples that illustrate the usage of the introduced concepts.