CateDocs: A New Visual Analytics Approach to Exploring High Dimensional Categorical Datasets

Yueqi Hu, Xiaoke Huang, Chong Zhang, Yingyu Wu, Jing Yang, Ye Zhao, Scott Barlowe, Wei Chen
Most existing multidimensional visualization techniques do not work well for high dimensional categorical datasets. The major challenges include preserving the discrete nature of the data and visually exploring the high dimensional space. In this poster, we propose a new visual analytics approach for high dimensional categorical data. Our methodology is to convert a categorical dataset into a document corpus and then apply advanced document analysis and visualization techniques to the corpus. Two prominent knowledge discovery tasks, namely cluster analysis and multivariate analysis, are supported. For cluster analysis, the Latent Dirichlet Allocation (LDA) topic model is employed to discover subspace clusters in a categorical dataset. The clusters are then visualized in a semantically rich visualization for interactive visual analysis. For multivariate analysis, LDA is used for dimension reduction and optimal rule mining is used to discover rules describing multivariate relationships in the reduced subspaces. The effectiveness of this approach has been illustrated by case studies on real datasets.