Abstract:
The extraction of significant structures in arbitrary high-dimensional data
sets is a challenging task. Moreover, classifying data points as noise in
order to reduce a data set bears special relevance for many application
domains. Standard methods such as clustering serve to reduce problem
complexity by providing the user with classes of similar entities. However,
they usually do not highlight relations between different entities and
require a stopping criterion, e.g. the number of clusters to be detected. In
this paper, we present a visualization pipeline based on recent advancements
in algebraic topology. More precisely, we employ methods from persistent
homology that enable topological data analysis on high-dimensional data sets.
Our pipeline inherently copes with noisy data and data sets of arbitrary
dimensions. It extracts central structures of a data set in a hierarchical
manner by using a persistence-based filtering algorithm that is theoretically
well-founded. We furthermore introduce persistence rings, a novel
visualization technique for a class of topological features-the persistence
intervals-of large data sets. Persistence rings provide a unique topological
signature of a data set, which helps in recognizing similarities. In
addition, we provide interactive visualization techniques that assist the
user in evaluating the parameter space of our method in order to extract
relevant structures. We describe and evaluate our analysis pipeline by means
of two very distinct classes of data sets: First, a class of synthetic data
sets containing topological objects is employed to highlight the interaction
capabilities of our method. Second, in order to affirm the utility of our
technique, we analyse a class of high-dimensional real-world data sets
arising from current research in cultural heritage.