Visualization vs Expertise: Case Studies in Lexicography and Genomics

Erez Lieberman Aiden

In this talk, I use two case studies: the study of genome folding, and the study of recent human history, to discuss the emerging ways in which data visualization can complement - and in some cases, compete with - traditional forms of expertise.

First, I will describe Hi-C, a novel technology for probing the three-dimensional architecture of whole genomes. Developed together with collaborators at the Broad Institute and UMass Medical School, Hi-C couples proximity-dependent DNA ligation and massively parallel sequencing. My lab employs Hi-C to construct spatial proximity maps of the human genome. Hi-C maps have revealed that active and inactive portions of the human genome are spatially segregated, ie, that cells employ a sort of 'regulatory origami' as they turn genes on and off. At the megabase scale, these maps are consistent with a fractal globule, a knot-free conformation that enables maximally dense packing while preserving the ability to easily fold and unfold any genomic locus. Next, I will describe collaborative efforts, together with Jean-Baptiste Michel and Google, to create tools for the visual interrogation of a significant portion of the historical record. We began by constructing a reliable corpus of digitized texts containing about 4% of all books ever printed. Analysis of this corpus enables us to investigate cultural trends quantitatively. We survey the vast terrain of ‘culturomics,’ focusing on linguistic and cultural phenomena that were reflected in the English language between 1800 and 2000. We show how this approach can provide insights about fields as diverse as lexicography, the evolution of grammar, collective memory, the adoption of technology, the pursuit of fame, censorship, and historical epidemiology. Such analyses are intuitive and addictive: the Google Ngram Viewer, a simple web-based tool we released for the analysis of this corpus has been used many millions of times and hs recently been incorporated into Google's online dictionary.


Erez Lieberman Aiden is Assistant Professor of Molecular and Human Genetics at the Baylor College of Medicine, where he is Director of the Center for Genome Architecture, and of Computer Science and Applied Mathematics at Rice University. His work integrates mathematical and physical theory with the invention of new technologies. 

Erez recently invented a method for three-dimensional genome sequencing; he subsequently led the team that, in 2009, reported the first three dimensional map of the human genome. Together with collaborator Jean-Baptiste Michel, he developed culturomics, a quantitative approach to the study of history and culture that relies on computational analysis of a significant fraction of the historical record. This work led to the creation of the Google Ngram Viewer, a tool that has been used many millions of times and which has become an integral element of Google's online dictionary.

Erez's research has won numerous awards, including a New Innovator Award from the the National Institutes of Health; a Junior fellowship from the Harvard Society of Fellows; recognition for one of the top 20 "Biotech Breakthroughs that will Change Medicine", by Popular Mechanics; the Lemelson-MIT prize for the best student inventor at MIT; the American Physical Society's Award for the Best Doctoral Dissertation in Biological Physics; and membership in Technology Review's 2009 TR35, recognizing the top 35 innovators under 35. In 2012, he recieved the President's Early Career Award in Science and Engineering, the highest government honor for young scientists, from Barack Obama.

His last three research articles have all appeared on the cover of Nature and Science. His work has also been featured on the front page of the New York Times, the Boston Globe, and the Wall Street Journal. He is the author of Uncharted: Big Data as a Lens on Human Culture, forthcoming with Penguin Press.