Rankings are a popular and universal approach to structuring otherwise unorganized collections of items by computing a rank for each item based on the value of one or more of its attributes. This allows us, for example, to prioritize tasks or to evaluate the performance of products relative to each other. While the visualization of a ranking itself is straightforward, its interpretation is not, because the rank of an item represents only a summary of a potentially complicated relationship between its attributes and those of the other items. It is also common that alternative rankings exist which need to be compared and analyzed to gain insight into how multiple heterogeneous attributes affect the rankings. Advanced visual exploration tools are needed to make this process efficient. In this paper we present a comprehensive analysis of requirements for the visualization of multi-attribute rankings. Based on these considerations, we propose LineUp - a novel and scalable visualization technique that uses bar charts. This interactive technique supports the ranking of items based on multiple heterogeneous attributes with different scales and semantics. It enables users to interactively combine attributes and flexibly refine parameters to explore the effect of changes in the attribute combination. This process can be employed to derive actionable insights as to which attributes of an item need to be modified in order for its rank to change. Additionally, through integration of slope graphs, LineUp can also be used to compare multiple alternative rankings on the same set of items, for example, over time or across different attribute combinations. We evaluate the effectiveness of the proposed multi-attribute visualization technique in a qualitative study. The study shows that users are able to successfully solve complex ranking tasks in a short period of time.
Many application domains deal with multi-variate data that consist of both categorical and numerical information. Smallmultiple displays are a powerful concept for comparing such data by juxtaposition. For comparison by overlay or by explicit encoding of computed differences, however, a specification of references is necessary. In this paper, we present a formal model for defining semantically meaningful comparisons between many categories in a small-multiple display. Based on pivotized data that are hierarchically partitioned by the categories assigned to the x and y axis of the display, we propose two alternatives for structure-based comparison within this hierarchy. With an absolute reference specification, categories are compared to a fixed reference category. With a relative reference specification, in contrast, a semantic ordering of the categories is considered when comparing them either to the previous or subsequent category each. Both reference specifications can be defined at multiple levels of the hierarchy (including aggregated summaries), enabling a multitude of useful comparisons. We demonstrate the general applicability of our model in several application examples using different visualizations that compare data by overlay or explicit encoding of differences.
Visualizations are great tools of communicationsﾐthey summarize findings and quickly convey main messages to our audience. As designers of charts we have to make sure that information is shown with a minimum of distortion. We have to also consider illusions and other perceptual limitations of our audience. In this paper we discuss the effect and strength of the line width illusion, a Mｬ uller-Lyer type illusion, on designs related to displaying associations between categorical variables. Parallel sets and hammock plots are both affected by line width illusions. We introduce the common-angle plot as an alternative method for displaying categorical data in a manner that minimizes the effect from perceptual illusions. Results from user studies both highlight the need for addressing line-width illusions in displays and provide evidence that common angle charts successfully resolve this issue.
An ongoing debate in the Visualization community concerns the role that visualization types play in data understanding. In human cognition, understanding and memorability are intertwined. As a first step towards being able to ask questions about impact and effectiveness, here we ask: ﾒWhat makes a visualization memorable?ﾓ We ran the largest scale visualization study to date using 2,070 single-panel visualizations, categorized with visualization type (e.g., bar chart, line graph, etc.), collected from news media sites, government reports, scientific journals, and infographic sources. Each visualization was annotated with additional attributes, including ratings for data-ink ratios and visual densities. Using Amazonﾕs Mechanical Turk, we collected memorability scores for hundreds of these visualizations, and discovered that observers are consistent in which visualizations they find memorable and forgettable. We find intuitive results (e.g., attributes like color and the inclusion of a human recognizable object enhance memorability) and less intuitive results (e.g., common graphs are less memorable than unique visualization types). Altogether our findings suggest that quantifying memorability is a general metric of the utility of information, an essential step towards determining how to design effective visualizations.
The visual system can make highly efficient aggregate judgements about a set of objects, with speed roughly independent of the number of objects considered. While there is a rich literature on these mechanisms and their ramifications for visual summarization tasks, this prior work rarely considers more complex tasks requiring multiple judgements over long periods of time, and has not considered certain critical aggregation types, such as the localization of the mean value of a set of points. In this paper, we explore these questions using a common visualization task as a case study: relative mean value judgements within multi-class scatterplots. We describe how the perception literature provides a set of expected constraints on the task, and evaluate these predictions with a large-scale perceptual study with crowd-sourced participants. Judgements are no harder when each set contains more points, redundant and conflicting encodings, as well as additional sets, do not strongly affect performance, and judgements are harder when using less salient encodings. These results have concrete ramifications for the design of scatterplots.
Scatter plots are diagrams that visualize two-dimensional data as sets of points in the plane. They allow users to detect correlations and clusters in the data. Whether or not a user can accomplish these tasks highly depends on the aspect ratio selected for the plot, i.e., the ratio between the horizontal and the vertical extent of the diagram. We argue that an aspect ratio is good if the Delaunay triangulation of the scatter plot at this aspect ratio has some nice geometric property, e.g., a large minimum angle or a small total edge length. More precisely, we consider the following optimization problem. Given a set Q of points in the plane, find a scale factor s such that scaling the x-coordinates of the points in Q by s and the y-coordinates by 1=s yields a point set P(s) that optimizes a property of the Delaunay triangulation of P(s), over all choices of s. We present an algorithm that solves this problem efficiently and demonstrate its usefulness on real-world instances. Moreover, we discuss an empirical test in which we asked 64 participants to choose the aspect ratios of 18 scatter plots. We tested six different quality measures that our algorithm can optimize. In conclusion, minimizing the total edge length and minimizing what we call the ﾒuncompactnessﾓ of the triangles of the Delaunay triangulation yielded the aspect ratios that were most similar to those chosen by the participants in the test.
In controlled experiments on the relation of display size (i.e., the number of pixels) and the usability of visualizations, the size of the information space can either be kept constant or varied relative to display size. Both experimental approaches have limitations. If the information space is kept constant then the scale ratio between an overview of the entire information space and the lowest zoom level varies, which can impact performance; if the information space is varied then the scale ratio is kept constant, but performance cannot be directly compared. In other words, display size, information space, and scale ratio are interrelated variables. We investigate this relation in two experiments with interfaces that implement classic information visualization techniquesﾑfocus+context, overview+detail, and zoomingﾑfor multi-scale navigation in maps. Display size varied between 0.17, 1.5, and 13.8 megapixels. Information space varied relative to display size in one experiment and was constant in the other. Results suggest that for tasks where users navigate targets that are visible at all map scales the interfaces do not benefit from a large display: With a constant map size, a larger display does not improve performance with the interfaces; with map size varied relative to display size, participants found interfaces harder to use with a larger display and task completion times decrease only when they are normalized to compensate for the increase in map size. The two experimental approaches show different interaction effects between display size and interface. In particular, focus+context performs relatively worse at a large display size with variable map size, and relatively worse at a small display size with a fixed map size. Based on a theoretical analysis of the interaction with the visualization techniques, we examine individual task actions empirically so as to understand the relative impact of display size and scale ratio on the visualization techniquesﾕ performance and to discuss differences between the two experimental approaches.
We present a first investigation into hybrid-image visualization for data analysis in large-scale viewing environments. Hybrid-image visualizations blend two different visual representations into a single static view, such that each representation can be perceived at a different viewing distance. Our work is motivated by data analysis scenarios that incorporate one or more displays with sufficiently large size and resolution to be comfortably viewed by different people from various distances. Hybrid-image visualizations can be used, in particular, to enhance overview tasks from a distance and detail-in-context tasks when standing close to the display. By using a perception-based blending approach, hybrid-image visualizations make two full-screen visualizations accessible without tracking viewers in front of a display. We contribute a design space, discuss the perceptual rationale for our work, provide examples, and introduce a set of techniques and tools to aid the design of hybrid-image visualizations.
Proposals to establish a ﾒscience of interactionﾓ have been forwarded from Information Visualization and Visual Analytics, as well as Cartography, Geovisualization, and GIScience. This paper reports on two studies to contribute to this call for an interaction science, with the goal of developing a functional taxonomy of interaction primitives for map-based visualization. A semi-structured interview study first was conducted with 21 expert interactive map users to understand the way in which map-based visualizations currently are employed. The interviews were transcribed and coded to identify statements representative of either the task the user wished to accomplish (i.e., objective primitives) or the interactive functionality included in the visualization to achieve this task (i.e., operator primitives). A card sorting study then was conducted with 15 expert interactive map designers to organize these example statements into logical structures based on their experience translating client requests into interaction designs. Example statements were supplemented with primitive definitions in the literature and were separated into two sorting exercises: objectives and operators. The objective sort suggested five objectives that increase in cognitive sophistication (identify, compare, rank, associate, and delineate), but exhibited a large amount of variation across participants due to consideration of broader user goals (procure, predict, and prescribe) and interaction operands (space-alone, attributes-in-space, and space-in-time; elementary and general). The operator sort suggested five enabling operators (import, export, save, edit, and annotate) and twelve work operators (reexpress, arrange, sequence, resymbolize, overlay, pan, zoom, reproject, search, filter, retrieve, and calculate). This taxonomy offers an empirically-derived and ecologically-valid structure to inform future research and design on interaction.
Knowledge about visualization tasks plays an important role in choosing or building suitable visual representations to pursue them. Yet, tasks are a multi-faceted concept and it is thus not surprising that the many existing task taxonomies and models all describe different aspects of tasks, depending on what these task descriptions aim to capture. This results in a clear need to bring these different aspects together under the common hood of a general design space of visualization tasks, which we propose in this paper. Our design space consists of five design dimensions that characterize the main aspects of tasks and that have so far been distributed across different task descriptions. We exemplify its concrete use by applying our design space in the domain of climate impact research. To this end, we propose interfaces to our design space for different user roles (developers, authors, and end users) that allow users of different levels of expertise to work with it.
The considerable previous work characterizing visualization usage has focused on low-level tasks or interactions and high-level tasks, leaving a gap between them that is not addressed. This gap leads to a lack of distinction between the ends and means of a task, limiting the potential for rigorous analysis. We contribute a multi-level typology of visualization tasks to address this gap, distinguishing why and how a visualization task is performed, as well as what the task inputs and outputs are. Our typology allows complex tasks to be expressed as sequences of interdependent simpler tasks, resulting in concise and flexible descriptions for tasks of varying complexity and scope. It provides abstract rather than domain-specific descriptions of tasks, so that useful comparisons can be made between visualization systems targeted at different application domains. This descriptive power supports a level of analysis required for the generation of new designs, by guiding the translation of domain-specific problems into abstract tasks, and for the qualitative evaluation of visualization usage. We demonstrate the benefits of our approach in a detailed case study, comparing task descriptions from our typology to those derived from related work. We also discuss the similarities and differences between our typology and over two dozen extant classification systems and theoretical frameworks from the literatures of visualization, human-computer interaction, information retrieval, communications, and cartography.
People typically interact with information visualizations using a mouse. Their physical movement, orientation, and distance to visualizations are rarely used as input. We explore how to use such spatial relations among people and visualizations (i.e., proxemics) to drive interaction with visualizations, focusing here on the spatial relations between a single user and visualizations on a large display. We implement interaction techniques that zoom and pan, query and relate, and adapt visualizations based on tracking of usersﾕ position in relation to a large high-resolution display. Alternative prototypes are tested in three user studies and compared with baseline conditions that use a mouse. Our aim is to gain empirical data on the usefulness of a range of design possibilities and to generate more ideas. Among other things, the results show promise for changing zoom level or visual representation with the userﾕs physical distance to a large display. We discuss possible benefits and potential issues to avoid when designing information visualizations that use proxemics.
We present an interaction model for beyond-desktop visualizations that combines the visualization reference model with the instrumental interaction paradigm. Beyond-desktop visualizations involve a wide range of emerging technologies such as wall-sized displays, 3D and shape-changing displays, touch and tangible input, and physical information visualizations. While these technologies allow for new forms of interaction, they are often studied in isolation. New conceptual models are needed to build a coherent picture of what has been done and what is possible. We describe a modified pipeline model where raw data is processed into a visualization and then rendered into the physical world. Users can explore or change data by directly manipulating visualizations or through the use of instruments. Interactions can also take place in the physical world outside the visualization system, such as when using locomotion to inspect a large scale visualization. Through case studies we illustrate how this model can be used to describe both conventional and unconventional interactive visualization systems, and compare different design alternatives.
Conveying a narrative with visualizations often requires choosing an order in which to present visualizations. While evidence exists that narrative sequencing in traditional stories can affect comprehension and memory, little is known about how sequencing choices affect narrative visualization. We consider the forms and reactions to sequencing in narrative visualization presentations to provide a deeper understanding with a focus on linear, ﾒslideshow-styleﾓ presentations. We conduct a qualitative analysis of 42 professional narrative visualizations to gain empirical knowledge on the forms that structure and sequence take. Based on the results of this study we propose a graph-driven approach for automatically identifying effective sequences in a set of visualizations to be presented linearly. Our approach identifies possible transitions in a visualization set and prioritizes local (visualization-to-visualization) transitions based on an objective function that minimizes the cost of transitions from the audience perspective. We conduct two studies to validate this function. We also expand the approach with additional knowledge of user preferences for different types of local transitions and the effects of global sequencing strategies on memory, preference, and comprehension. Our results include a relative ranking of types of visualization transitions by the audience perspective and support for memory and subjective rating benefits of visualization sequences that use parallelism as a structural device. We discuss how these insights can guide the design of narrative visualization and systems that support optimization of visualization sequence.
Presenting and communicating insights to an audienceﾑtelling a storyﾑis one of the main goals of data exploration. Even though visualization as a storytelling medium has recently begun to gain attention, storytelling is still underexplored in information visualization and little research has been done to help people tell their stories with data. To create a new, more engaging form of storytelling with data, we leverage and extend the narrative storytelling attributes of whiteboard animation with pen and touch interactions. We present SketchStory, a data-enabled digital whiteboard that facilitates the creation of personalized and expressive data charts quickly and easily. SketchStory recognizes a small set of sketch gestures for chart invocation, and automatically completes charts by synthesizing the visuals from the presenter-provided example icon and binding them to the underlying data. Furthermore, SketchStory allows the presenter to move and resize the completed data charts with touch, and filter the underlying data to facilitate interactive exploration. We conducted a controlled experiment for both audiences and presenters to compare SketchStory with a traditional presentation system, Microsoft PowerPoint. Results show that the audience is more engaged by presentations done with SketchStory than PowerPoint. Eighteen out of 24 audience participants preferred SketchStory to PowerPoint. Four out of five presenter participants also favored SketchStory despite the extra effort required for presentation.
We introduce Visual Sedimentation, a novel design metaphor for visualizing data streams directly inspired by the physical process of sedimentation. Visualizing data streams (e. g., Tweets, RSS, Emails) is challenging as incoming data arrive at unpredictable rates and have to remain readable. For data streams, clearly expressing chronological order while avoiding clutter, and keeping aging data visible, are important. The metaphor is drawn from the real-world sedimentation processes: objects fall due to gravity, and aggregate into strata over time. Inspired by this metaphor, data is visually depicted as falling objects using a force model to land on a surface, aggregating into strata over time. In this paper, we discuss how this metaphor addresses the specific challenge of smoothing the transition between incoming and aging data. We describe the metaphorﾕs design space, a toolkit developed to facilitate its implementation, and example applications to a range of case studies. We then explore the generative capabilities of the design space through our toolkit. We finally illustrate creative extensions of the metaphor when applied to real streams of data.
From financial statistics to nutritional values, we are frequently exposed to quantitative information expressed in measures of either extreme magnitudes or unfamiliar units, or both. A common practice used to comprehend such complex measures is to relate, re-express, and compare them through visual depictions using magnitudes and units that are easier to grasp. Through this practice, we create a new graphic composition that we refer to as a concrete scale. To the best of our knowledge, there are no design guidelines that exist for concrete scales despite their common use in communication, educational, and decision-making settings. We attempt to fill this void by introducing a novel framework that would serve as a practical guide for their analysis and design. Informed by a thorough analysis of graphic compositions involving complex measures and an extensive literature review of scale cognition mechanisms, our framework outlines the design space of various measure relationsﾑspecifically relations involving the re-expression of complex measures to more familiar conceptsﾑand their visual representations as graphic compositions.
Storyline visualizations, which are useful in many applications, aim to illustrate the dynamic relationships between entities in a story. However, the growing complexity and scalability of stories pose great challenges for existing approaches. In this paper, we propose an efficient optimization approach to generating an aesthetically appealing storyline visualization, which effectively handles the hierarchical relationships between entities over time. The approach formulates the storyline layout as a novel hybrid optimization approach that combines discrete and continuous optimization. The discrete method generates an initial layout through the ordering and alignment of entities, and the continuous method optimizes the initial layout to produce the optimal one. The efficient approach makes real-time interactions (e.g., bundling and straightening) possible, thus enabling users to better understand and track how the story evolves. Experiments and case studies are conducted to demonstrate the effectiveness and usefulness of the optimization approach.
Consider real-time exploration of large multidimensional spatiotemporal datasets with billions of entries, each defined by a location, a time, and other attributes. Are certain attributes correlated spatially or temporally? Are there trends or outliers in the data? Answering these questions requires aggregation over arbitrary regions of the domain and attributes of the data. Many relational databases implement the well-known data cube aggregation operation, which in a sense precomputes every possible aggregate query over the database. Data cubes are sometimes assumed to take a prohibitively large amount of space, and to consequently require disk storage. In contrast, we show how to construct a data cube that fits in a modern laptopﾕs main memory, even for billions of entries; we call this data structure a nanocube. We present algorithms to compute and query a nanocube, and show how it can be used to generate well-known visual encodings such as heatmaps, histograms, and parallel coordinate plots. When compared to exact visualizations created by scanning an entire dataset, nanocube plots have bounded screen error across a variety of scales, thanks to a hierarchical structure in space and time. We demonstrate the effectiveness of our technique on a variety of real-world datasets, and present memory, timing, and network bandwidth measurements. We find that the timings for the queries in our examples are dominated by network and user-interaction latencies.
Distributed systems are complex to develop and administer, and performance problem diagnosis is particularly challenging. When performance degrades, the problem might be in any of the systemﾕs many components or could be a result of poor interactions among them. Recent research efforts have created tools that automatically localize the problem to a small number of potential culprits, but research is needed to understand what visualization techniques work best for helping distributed systems developers understand and explore their results. This paper compares the relative merits of three well-known visualization approaches (side-by-side, diff, and animation) in the context of presenting the results of one proven automated localization technique called request-flow comparison. Via a 26-person user study, which included real distributed systems developers, we identify the unique benefits that each approach provides for different problem types and usage modes.
Having effective visualizations of filesystem provenance data is valuable for understanding its complex hierarchical structure. The most common visual representation of provenance data is the node-link diagram. While effective for understanding local activity, the node-link diagram fails to offer a high-level summary of activity and inter-relationships within the data. We present a new tool, InProv, which displays filesystem provenance with an interactive radial-based tree layout. The tool also utilizes a new time-based hierarchical node grouping method for filesystem provenance data we developed to match the userﾕs mental model and make data exploration more intuitive. We compared InProv to a conventional node-link based tool, Orbiter, in a quantitative evaluation with real users of filesystem provenance data including provenance data experts, IT professionals, and computational scientists. We also compared in the evaluation our new node grouping method to a conventional method. The results demonstrate that InProv results in higher accuracy in identifying system activity than Orbiter with large complex data sets. The results also show that our new time-based hierarchical node grouping method improves performance in both tools, and participants found both tools significantly easier to use with the new time-based node grouping method. Subjective measures show that participants found InProv to require less mental activity, less physical activity, less work, and is less stressful to use. Our study also reveals one of the first cases of gender differences in visualization; both genders had comparable performance with InProv, but women had a significantly lower average accuracy (56%) compared to men (70%) with Orbiter.
An important feature of networks for many application domains is their community structure. This is because objects within the same community usually have at least one property in common. The investigation of community structure can therefore support the understanding of object attributes from the network topology alone. In real-world systems, objects may belong to several communities at the same time, i.e., communities can overlap. Analyzing fuzzy community memberships is essential to understand to what extent objects contribute to different communities and whether some communities are highly interconnected. We developed a visualization approach that is based on node-link diagrams and supports the investigation of fuzzy communities in weighted undirected graphs at different levels of detail. Starting with the network of communities, the user can continuously drill down to the network of individual nodes and finally analyze the membership distribution of nodes of interest. Our approach uses layout strategies and further visual mappings to graphically encode the fuzzy community memberships. The usefulness of our approach is illustrated by two case studies analyzing networks of different domains: social networking and biological interactions. The case studies showed that our layout and visualization approach helps investigate fuzzy overlapping communities. Fuzzy vertices as well as the different communities to which they belong can be easily identified based on node color and position.
In many applications, data tables contain multi-valued attributes that often store the memberships of the table entities to multiple sets such as which languages a person masters, which skills an applicant documents, or which features a product comes with. With a growing number of entities, the resulting element-set membership matrix becomes very rich of information about how these sets overlap. Many analysis tasks targeted at set-typed data are concerned with these overlaps as salient features of such data. This paper presents Radial Sets, a novel visual technique to analyze set memberships for a large number of elements. Our technique uses frequency-based representations to enable quickly finding and analyzing different kinds of overlaps between the sets, and relating these overlaps to other attributes of the table entities. Furthermore, it enables various interactions to select elements of interest, find out if they are over-represented in specific sets or overlaps, and if they exhibit a different distribution for a specific attribute compared to the rest of the elements. These interactions allow formulating highly-expressive visual queries on the elements in terms of their set memberships and attribute values. As we demonstrate via two usage scenarios, Radial Sets enable revealing and analyzing a multitude of overlapping patterns between large sets, beyond the limits of state-of-the-art techniques.
This article presents SoccerStories, a visualization interface to support analysts in exploring soccer data and communicating interesting insights. Currently, most analyses on such data relate to statistics on individual players or teams. However, soccer analysts we collaborated with consider that quantitative analysis alone does not convey the right picture of the game, as context, player positions and phases of player actions are the most relevant aspects. We designed SoccerStories to support the current practice of soccer analysts and to enrich it, both in the analysis and communication stages. Our system provides an overview+detail interface of game phases, and their aggregation into a series of connected visualizations, each visualization being tailored for actions such as a series of passes or a goal attempt. To evaluate our tool, we ran two qualitative user studies on recent games using SoccerStories with data from one of the worldﾕs leading live sports data providers. The first study resulted in a series of four articles on soccer tactics, by a tactics analyst, who said he would not have been able to write these otherwise. The second study consisted in an exploratory follow-up to investigate design alternatives for embedding soccer phases into word-sized graphics. For both experiments, we received a very enthusiastic feedback and participants consider further use of SoccerStories to enhance their current workflow.
Business ecosystems are characterized by large, complex, and global networks of firms, often from many different market segments, all collaborating, partnering, and competing to create and deliver new products and services. Given the rapidly increasing scale, complexity, and rate of change of business ecosystems, as well as economic and competitive pressures, analysts are faced with the formidable task of quickly understanding the fundamental characteristics of these interfirm networks. Existing tools, however, are predominantly query- or list-centric with limited interactive, exploratory capabilities. Guided by a field study of corporate analysts, we have designed and implemented dotlink360, an interactive visualization system that provides capabilities to gain systemic insight into the compositional, temporal, and connective characteristics of business ecosystems. dotlink360 consists of novel, multiple connected views enabling the analyst to explore, discover, and understand interfirm networks for a focal firm, specific market segments or countries, and the entire business ecosystem. System evaluation by a small group of prototypical users shows supporting evidence of the benefits of our approach. This design study contributes to the relatively unexplored, but promising area of exploratory information visualization in market research and business strategy.
We enhance a user-centered design process with techniques that deliberately promote creativity to identify opportunities for the visualization of data generated by a major energy supplier. Visualization prototypes developed in this way prove effective in a situation whereby data sets are largely unknown and requirements open ﾐ enabling successful exploration of possibilities for visualization in Smart Home data analysis. The process gives rise to novel designs and design metaphors including data sculpting. It suggests: that the deliberate use of creativity techniques with data stakeholders is likely to contribute to successful, novel and effective solutions; that being explicit about creativity may contribute to designers developing creative solutions; that using creativity techniques early in the design process may result in a creative approach persisting throughout the process. The work constitutes the first systematic visualization design for a data rich source that will be increasingly important to energy suppliers and consumers as Smart Meter technology is widely deployed. It is novel in explicitly employing creativity techniques at the requirements stage of visualization design and development, paving the way for further use and study of creativity methods in visualization design.
Biological pathway maps are highly relevant tools for many tasks in molecular biology. They reduce the complexity of the overall biological network by partitioning it into smaller manageable parts. While this reduction of complexity is their biggest strength, it is, at the same time, their biggest weakness. By removing what is deemed not important for the primary function of the pathway, biologists lose the ability to follow and understand cross-talks between pathways. Considering these cross-talks is, however, critical in many analysis scenarios, such as judging effects of drugs. In this paper we introduce Entourage, a novel visualization technique that provides contextual information lost due to the artificial partitioning of the biological network, but at the same time limits the presented information to what is relevant to the analystﾕs task. We use one pathway map as the focus of an analysis and allow a larger set of contextual pathways. For these context pathways we only show the contextual subsets, i.e., the parts of the graph that are relevant to a selection. Entourage suggests related pathways based on similarities and highlights parts of a pathway that are interesting in terms of mapped experimental data. We visualize interdependencies between pathways using stubs of visual links, which we found effective yet not obtrusive. By combining this approach with visualization of experimental data, we can provide domain experts with a highly valuable tool. We demonstrate the utility of Entourage with case studies conducted with a biochemist who researches the effects of drugs on pathways. We show that the technique is well suited to investigate interdependencies between pathways and to analyze, understand, and predict the effect that drugs have on different cell types.
Scientists use DNA sequence differences between an individualﾕs genome and a standard reference genome to study the genetic basis of disease. Such differences are called sequence variants, and determining their impact in the cell is difficult because it requires reasoning about both the type and location of the variant across several levels of biological context. In this design study, we worked with four analysts to design a visualization tool supporting variant impact assessment for three different tasks. We contribute data and task abstractions for the problem of variant impact assessment, and the carefully justified design and implementation of the Variant View tool. Variant View features an information-dense visual encoding that provides maximal information at the overview level, in contrast to the extensive navigation required by currently-prevalent genome browsers. We provide initial evidence that the tool simplified and accelerated workflows for these three tasks through three case studies. Finally, we reflect on the lessons learned in creating and refining data and task abstractions that allow for concise overviews of sprawling information spaces that can reduce or remove the need for the memory-intensive use of navigation.
Visualization of dynamically changing networks (graphs) is a significant challenge for researchers. Previous work has experimentally compared animation, small multiples, and other techniques, and found trade-offs between these. One potential way to avoid such trade-offs is to combine previous techniques in a hybrid visualization. We present two taxonomies of visualizations of dynamic graphs: one of non-hybrid techniques, and one of hybrid techniques. We also describe a prototype, called DiffAni, that allows a graph to be visualized as a sequence of three kinds of tiles: diff tiles that show difference maps over some time interval, animation tiles that show the evolution of the graph over some time interval, and small multiple tiles that show the graph state at an individual time slice. This sequence of tiles is ordered by time and covers all time slices in the data. An experimental evaluation of DiffAni shows that our hybrid approach has advantages over non-hybrid techniques in certain cases.
To analyze data such as the US Federal Budget or characteristics of the student population of a University it is common to look for changes over time. This task can be made easier and more fruitful if the analysis is performed by grouping by attributes, such as by Agencies, Bureaus and Accounts for the Budget, or Ethnicity, Gender and Major in a University. We present TreeVersity2, a web based interactive data visualization tool that allows users to analyze change in datasets by creating dynamic hierarchies based on the data attributes. TreeVersity2 introduces a novel space filling visualization (StemView) to represent change in trees at multiple levels - not just at the leaf level. With this visualization users can explore absolute and relative changes, created and removed nodes, and each nodeﾕs actual values, while maintaining the context of the tree. In addition, TreeVersity2 provides overviews of change over the entire time period, and a reporting tool that lists outliers in textual form, which helps users identify the major changes in the data without having to manually setup filters. We validated TreeVersity2 with 12 case studies with organizations as diverse as the National Cancer Institute, Federal Drug Administration, Department of Transportation, Office of the Bursar of the University of Maryland, or eBay. Our case studies demonstrated that TreeVersity2 is flexible enough to be used in different domains and provide useful insights for the data owners. A TreeVersity2 demo can be found at https://treeversity.cattlab.umd.edu.
This paper is concerned with the creation of ﾔmacrosﾕ in workflow visualization as a support tool to increase the efficiency of data curation tasks. We propose computation of candidate macros based on their usage in large collections of workflows in data repositories. We describe an efficient algorithm for extracting macro motifs from workflow graphs. We discovered that the state transition information, used to identify macro candidates, characterizes the structural pattern of the macro and can be harnessed as part of the visual design of the corresponding macro glyph. This facilitates partial automation and consistency in glyph design applicable to a large set of macro glyphs. We tested this approach against a repository of biological data holding some 9,670 workflows and found that the algorithmically generated candidate macros are in keeping with domain expert expectations.
Domain-specific database applications tend to contain a sizable number of table-, form-, and report-style views that must each be designed and maintained by a software developer. A significant part of this job is the necessary tweaking of low-level presentation details such as label placements, text field dimensions, list or table styles, and so on. In this paper, we present a horizontally constrained layout management algorithm that automates the display of structured hierarchical data using the traditional visual idioms of hand-designed database UIs: tables, multi-column forms, and outline-style indented lists. We compare our system with pure outline and nested table layouts with respect to space efficiency and readability, the latter with an online user study on 27 subjects. Our layouts are 3.9 and 1.6 times more compact on average than outline layouts and horizontally unconstrained table layouts, respectively, and are as readable as table layouts even for large datasets.
We explore the effectiveness of visualizing dense directed graphs by replacing individual edges with edges connected to ﾒmodulesﾓﾑor groups of nodesﾑsuch that the new edges imply aggregate connectivity. We only consider techniques that offer a lossless compression: that is, where the entire graph can still be read from the compressed version. The techniques considered are: a simple grouping of nodes with identical neighbor sets; Modular Decomposition which permits internal structure in modules and allows them to be nested; and Power Graph Analysis which further allows edges to cross module boundaries. These techniques all have the same goalﾑto compress the set of edges that need to be rendered to fully convey connectivityﾑbut each successive relaxation of the module definition permits fewer edges to be drawn in the rendered graph. Each successive technique also, we hypothesize, requires a higher degree of mental effort to interpret. We test this hypothetical trade-off with two studies involving human participants. For Power Graph Analysis we propose a novel optimal technique based on constraint programming. This enables us to explore the parameter space for the technique more precisely than could be achieved with a heuristic. Although applicable to many domains, we are motivated byﾑand discuss in particularﾑthe application to software dependency analysis.
Scatterplot matrices (SPLOMs), parallel coordinates, and glyphs can all be used to visualize the multiple continuous variables (i.e., dependent variables or measures) in multidimensional multivariate data. However, these techniques are not well suited to visualizing many categorical variables (i.e., independent variables or dimensions). To visualize multiple categorical variables, ﾒhierarchical axesﾓ that ﾒstack dimensionsﾓ have been used in systems like Polaris and Tableau. However, this approach does not scale well beyond a small number of categorical variables. Emerson et al.  extend the matrix paradigm of the SPLOM to simultaneously visualize several categorical and continuous variables, displaying many kinds of charts in the matrix depending on the kinds of variables involved. We propose a variant of their technique, called the Generalized Plot Matrix (GPLOM). The GPLOM restricts Emerson et al.ﾕs technique to only three kinds of charts (scatterplots for pairs of continuous variables, heatmaps for pairs of categorical variables, and barcharts for pairings of categorical and continuous variable), in an effort to make it easier to understand. At the same time, the GPLOM extends Emerson et al.ﾕs work by demonstrating interactive techniques suited to the matrix of charts. We discuss the visual design and interactive features of our GPLOM prototype, including a textual search feature allowing users to quickly locate values or variables by name. We also present a user study that compared performance with Tableau and our GPLOM prototype, that found that GPLOM is significantly faster in certain cases, and not significantly slower in other cases.
Star coordinates is a popular projection technique from an nD data space to a 2D/3D visualization domain. It is defined by setting n coordinate axes in the visualization domain. Since it generally defines an affine projection, strong distortions can occur: an nD sphere can be mapped to an ellipse of arbitrary size and aspect ratio. We propose to restrict star coordinates to orthographic projections which map an nD sphere of radius r to a 2D circle of radius r. We achieve this by formulating conditions for the coordinate axes to define orthographic projections, and by running a repeated non-linear optimization in the background of every modification of the coordinate axes. This way, we define a number of orthographic interaction concepts as well as orthographic data tour sequences: a scatterplot tour, a principle component tour, and a grand tour. All concepts are illustrated and evaluated with synthetic and real data.
For high-dimensional data, this work proposes two novel visual exploration methods to gain insights into the data aspect and the dimension aspect of the data. The first is a Dimension Projection Matrix, as an extension of a scatterplot matrix. In the matrix, each row or column represents a group of dimensions, and each cell shows a dimension projection (such as MDS) of the data with the corresponding dimensions. The second is a Dimension Projection Tree, where every node is either a dimension projection plot or a Dimension Projection Matrix. Nodes are connected with links and each child node in the tree covers a subset of the parent nodeﾕs dimensions or a subset of the parent nodeﾕs data items. While the tree nodes visualize the subspaces of dimensions or subsets of the data items under exploration, the matrix nodes enable cross-comparison between different combinations of subspaces. Both Dimension Projection Matrix and Dimension Project Tree can be constructed algorithmically through automation, or manually through user interaction. Our implementation enables interactions such as drilling down to explore different levels of the data, merging or splitting the subspaces to adjust the matrix, and applying brushing to select data clusters. Our method enables simultaneously exploring data correlation and dimension correlation for data with high dimensions.
To verify cluster separation in high-dimensional data, analysts often reduce the data with a dimension reduction (DR) technique, and then visualize it with 2D Scatterplots, interactive 3D Scatterplots, or Scatterplot Matrices (SPLOMs). With the goal of providing guidance between these visual encoding choices, we conducted an empirical data study in which two human coders manually inspected a broad set of 816 scatterplots derived from 75 datasets, 4 DR techniques, and the 3 previously mentioned scatterplot techniques. Each coder scored all color-coded classes in each scatterplot in terms of their separability from other classes. We analyze the resulting quantitative data with a heatmap approach, and qualitatively discuss interesting scatterplot examples. Our findings reveal that 2D scatterplots are often ﾔgood enoughﾕ, that is, neither SPLOM nor interactive 3D adds notably more cluster separability with the chosen DR technique. If 2D is not good enough, the most promising approach is to use an alternative DR technique in 2D. Beyond that, SPLOM occasionally adds additional value, and interactive 3D rarely helps but often hurts in terms of poorer class separation and usability. We summarize these results as a workflow model and implications for design. Our results offer guidance to analysts during the DR exploration process.
Analysis of dynamic object deformations such as cardiac motion is of great importance, especially when there is a necessity to visualize and compare the deformation behavior across subjects. However, there is a lack of effective techniques for comparative visualization and assessment of a collection of motion data due to its 4-dimensional nature, i.e., timely varying three-dimensional shapes. From the geometric point of view, the motion change can be considered as a function defined on the 2D manifold of the surface. This paper presents a novel classification and visualization method based on a medial surface shape space, in which two novel shape descriptors are defined, for discriminating normal and abnormal human heart deformations as well as localizing the abnormal motion regions. In our medial surface shape space, the geodesic distance connecting two points in the space measures the similarity between their corresponding medial surfaces, which can quantify the similarity and disparity of the 3D heart motions. Furthermore, the novel descriptors can effectively localize the inconsistently deforming myopathic regions on the left ventricle. An easy visualization of heart motion sequences on the projected space allows users to distinguish the deformation differences. Our experimental results on both synthetic and real imaging data show that this method can automatically classify the healthy and myopathic subjects and accurately detect myopathic regions on the left ventricle, which outperforms other conventional cardiac diagnostic methods.