Skip to main content

Benjamin Ruppik: Measuring the Topology of Word Embeddings

Time: Tue 2025-11-25 10.15 - 11.15

Location: KTH 3418, Lindstedtsvägen 25 and Zoom

Video link: https://kth-se.zoom.us/j/65583358144?pwd=us6mdDtBgkEdZefvgbZPBWNujl3YuJ.1

Participating: Benjamin Ruppik (Heinrich Heine University, Düsseldorf)

Export to calendar

Abstract.

Understanding the internal mechanisms of large language models (LLMs) remains a challenging and complex endeavor. Even fundamental questions, such as how fine-tuning affects model behavior, often require extensive empirical evaluation. In our latest work, we introduce a novel perspective based on the geometric properties of contextual latent embeddings to study the effects of training and fine-tuning. To that end, we measure the local intrinsic dimensions of an LLM's latent space and analyze their shifts, which provide insights into the model's training dynamics and generalization ability. Our experiments suggest a practical heuristic: reductions in the mean local dimension tend to accompany and predict subsequent performance gains.

The talk is primarily based on our paper "Less is More: Local Intrinsic Dimensions of Contextual Language Models" which will be presented at NeurIPS 2025 (https://arxiv.org/abs/2506.01034), but will also cover broader applications of Topological Data Analysis in machine learning.