Abstract:
Classifying a set of objects into clusters can be done in numerous ways,
producing different results. They can be visually compared using contingency
tables [27], mosaicplots [13], fluctuation diagrams [15], tableplots [20] ,
(modified) parallel coordinates plots [28], Parallel Sets plots [18] or
circos diagrams [19]. Unfortunately the interpretability of all these
graphical displays decreases rapidly with the numbers of categories and
clusterings. In his famous book A Semiology of Graphics [5] Bertin writes the
discovery of an ordered concept appears as the ultimate point in logical
simplification since it permits reducing to a single instant the assimilation
of series which previously required many instants of study. Or in more
everyday language, if you use good orderings you can see results immediately
that with other orderings might take a lot of effort. This is also related to
the idea of effect ordering [12], that data should be organised to reflect
the effect you want to observe. This paper presents an efficient algorithm
based on Bertin's idea and concepts related to Kendall's t [17], which finds
informative joint orders for two or more nominal classification variables. We
also show how these orderings improve the various displays and how groups of
corresponding categories can be detected using a top-down partitioning
algorithm. Different clusterings based on data on the environmental
performance of cars sold in Germany are used for illustration. All presented
methods are available in the R package extracat which is used to compute the
optimized orderings for the example dataset.