Abstract:
Text data such as online news and microblogs bear valuable insights regarding
important events and responses to such events. Events are inherently
temporal, evolving over time. Existing visual text analysis systems have
provided temporal views of changes based on topical themes extracted from
text data. But few have associated topical themes with events that cause the
changes. In this paper, we propose an interactive visual analytics system,
LeadLine, to automatically identify meaningful events in news and social
media data and support exploration of the events. To characterize events,
LeadLine integrates topic modeling, event detection, and named entity
recognition techniques to automatically extract information regarding the
investigative 4 Ws: who, what, when, and where for each event. To further
support analysis of the text corpora through events, LeadLine allows users to
interactively examine meaningful events using the 4 Ws to develop an
understanding of how and why. Through representing large-scale text corpora
in the form of meaningful events, LeadLine provides a concise summary of the
corpora. LeadLine also supports the construction of simple narratives through
the exploration of events. To demonstrate the efficacy of LeadLine in
identifying events and supporting exploration, two case studies were
conducted using news and social media data.