In this talk, I use two case studies: the study of genome folding, and the study of recent human history, to discuss the emerging ways in which data visualization can complement - and in some cases, compete with - traditional forms of expertise.
First, I will describe Hi-C, a novel technology for probing the three-dimensional architecture of whole genomes. Developed together with collaborators at the Broad Institute and UMass Medical School, Hi-C couples proximity-dependent DNA ligation and massively parallel sequencing. My lab employs Hi-C to construct spatial proximity maps of the human genome. Hi-C maps have revealed that active and inactive portions of the human genome are spatially segregated, ie, that cells employ a sort of 'regulatory origami' as they turn genes on and off. At the megabase scale, these maps are consistent with a fractal globule, a knot-free conformation that enables maximally dense packing while preserving the ability to easily fold and unfold any genomic locus. Next, I will describe collaborative efforts, together with Jean-Baptiste Michel and Google, to create tools for the visual interrogation of a significant portion of the historical record. We began by constructing a reliable corpus of digitized texts containing about 4% of all books ever printed. Analysis of this corpus enables us to investigate cultural trends quantitatively. We survey the vast terrain of ‘culturomics,’ focusing on linguistic and cultural phenomena that were reflected in the English language between 1800 and 2000. We show how this approach can provide insights about fields as diverse as lexicography, the evolution of grammar, collective memory, the adoption of technology, the pursuit of fame, censorship, and historical epidemiology. Such analyses are intuitive and addictive: the Google Ngram Viewer, a simple web-based tool we released for the analysis of this corpus has been used many millions of times and hs recently been incorporated into Google's online dictionary.
In the past decades many new techniques have been developed to visualize and interact with abstract data, but also, many challenges remain. In my talk I will reflect on how to make progress in our field: how to identify interesting problems and next how to find effective solutions. I will begin with an attempt to identify characteristics of interesting problems, and discuss windows of opportunity for data, tasks, and users. Some problems have been solved, some are too hard to deal with, what is the range we should aim at? And what impact can be obtained? Next, I discuss strategies and approaches for finding novel solutions, such as combining existing approaches and finding inspiration in other disciplines, including art and design. This talk is based on lessons we learned while developing new techniques, and will be illustrated with a variety of cases and demos from our group at TU/e, showing successes and failures.