Alberto Gonzalez Martinez. PhD Presentation: Title: Exploratory analysis of Re

December 15, 12:00pm - 3:00pm
Mānoa Campus,

Title: Exploratory analysis of Research Publications Collections with Human Steerable Black-box Models Towards Generalizing Inverse Computations for Semantic Interaction

Understanding highly-dimensional data sets is a complex task for many scientists, engineers, and intelligence analysts. Traditionally, this problem has been tackled with linear pipelines that rely on mathematical models and algorithms to summarize relationships and structure, producing a visual representation of the data in a collapsed, low-dimensional form. The main issue with these traditional pipelines is that they are driven solely by algorithms or models, and without a human in the loop, they can potentially limit sense-making by masking expected or known structure in the data.

In recent years, Semantic Interaction has become a promising approach as a user interaction methodology for model steering in Visual Analytics systems, as it provides mechanisms with which to adjust the parameter space, explore data, and test hypotheses. Under the paradigm of Semantic Interaction, users can steer model parameters and explore data representations without leaving the visual space, thus combining algorithms and models with expert human judgment. Semantic Interaction systems need to invert the computation of one or more mathematical models to support a bidirectional structure within their pipelines to facilitate this interaction modality. For example, dimensionality reduction and clustering are frequently used to explore multidimensional data in Visual Analytic systems and are typically always present in Semantic Interaction systems. Since users interact with clustered data in its compressed form, the system needs to link this compressed form to the original high dimensional representation to affect the model and algorithms from within the visualization. The necessity of this reverse link from the low-dimensional representation to the high-dimensional input space requires that Semantic Interaction pipelines be bidirectional.

Most examples of Semantic Interaction systems make use of simple and interpretable linear models for dimensionality reduction and clusterings such as LDA (Latent Dirichlet Allocation) and PCA (Principal Component Analysis) to be able to provide a straightforward bidirectional pipeline. By contrast, the state-of-the-art techniques for dimensionality reduction and clustering in visual analytics, such as t-distributed stochastic neighbor embedding (t-SNE) and uniform manifold approximation and projection (UMAP), are "black-box" models, which are neither linear nor directly interpretable. Furthermore, these techniques are computationally expensive, suffer from out-of-sample stability problems, and are complex to retrain for new instances, requiring precise hyper-parameter tuning.

A novel Deep Surrogate model approach is proposed in this thesis to perform backward and forward computations within semantic interaction pipelines that were previously implemented with "black-box" models. This approach allows for the efficient "merging" of new instances into a previously trained model without retraining. It also provides a reverse link, allowing a trained model's parameters to be affected by user interactions with the visual representation of data. To demonstrate this approach's usefulness, I present the Zexplorer system, a tool for exploring Large Document Collections of Research papers with Semantic Interaction, as well as a user study to validate the approach. The Zexplorer system is built as an extension to Zotero, a widely-used open source bibliography system.

Join Zoom Meeting:
Meeting ID: 997 3162 9197
Passcode: goalberto

Event Sponsor
ICS, Mānoa Campus

More Information

Share by email