Skip to main content

Alice Harting: MCMC Estimation of Causal VAE Architectures with Applications to Spotify User Behavior

examensarbete

Time: Tue 2024-02-06 09.00 - 09.30

Location: KTH 3418, Lindstedtsvägen 25

Participating: Alice Harting (KTH)

Export to calendar

Abstract

A common task in data science at internet companies is to develop metrics that capture aspects of the user experience. In this thesis, we are interested in systems of measurement variables without direct causal relations such that covariance is explained by unobserved latent common causes. A framework for modeling the data generating process is given by Neuro-Causal Factor Analysis (NCFA). The graphical model consists of a directed graph with edges pointing from the latent common causes to the measurement variables; its functional relations are approximated with a constrained Variational Auto-Encoder (VAE). We refine the estimation of the graphical model by developing an MCMC algorithm over Bayesian networks from which we read marginal independence relations between the measurement variables. Unlike standard independence testing, the method is guaranteed to yield an identifiable graphical model. Our algorithm is competitive with the benchmark, and it admits additional flexibility via hyperparameters that are natural to the approach. Tuning these parameters yields superior performance over the benchmark. We train the improved NCFA model on Spotify user behavior data. It is competitive with the standard VAE on data reconstruction with the benefit of causal interpretability and model identifiability. We use the learned latent space representation to characterize clusters of Spotify users. Additionally, we train an NCFA model on data from a randomized control trial and observe treatment effects in the latent space.