Jiayi Li: Grokking as a Phase Transition Between Competing Basins, a Singular Learning Theory Perspective
Tid: Ti 2026-03-03 kl 10.15 - 11.15
Plats: KTH 3418, Lindstedtsvägen 25 and Zoom
Videolänk: https://kth-se.zoom.us/j/65583358144?pwd=us6mdDtBgkEdZefvgbZPBWNujl3YuJ.1
Medverkande: Jiayi Li (MPI-CBG Dresden)
Abstract.
Grokking refers to the phenomenon in which an artificial neural network undergoes an abrupt transition from memorization to generalization after prolonged training, despite reaching near-zero training error. This behavior suggests the presence of competing solution basins in the loss landscape with distinct geometric and statistical properties.
We study grokking through the lens of Singular Learning Theory (SLT). In a Bayesian learning framework, SLT characterizes the local geometry of the loss surface via the local learning coefficient (LLC), which quantifies the effective degeneracy of a solution basin. Basins with smaller LLC concentrate more posterior mass and yield better generalization performance. Using this perspective, we interpret grokking as a phase transition between competing near-zero-loss basins with different LLC values, and analyze quadratic networks trained on modular arithmetic tasks and derive closed-form expressions for the LLC of the relevant solution families. These calculations predict which basins dominate in the Bayesian posterior. Empirically, we verify these predictions and demonstrate that trajectories of the LLC during training provide a quantitative tool for tracking generalization dynamics.
This is joint work with Ben Cullen, Sergio Estan, and Riya Danait.
