Max Brehmer: Multidimensional change point detection using likelihood ratio statistics
Master thesis
Time: Wed 2026-02-11 13.15 - 14.00
Location: Mittag-Leffler room, Albano floor 3, house 1
Respondent: Max Brehmer
Supervisor: Johannes Heiny, Taariq Nazar
Abstract: This thesis tackles binary splitting of regression trees through the lens of change-point detection. Consider a dataset where for some p-dimensional feature space and for some 1-dimensional response space. A binary split attempts to form partitions of observations with similar values of the response.
A typical Classification and Regression Tree (CART) lacks an inherent stopping mechanism to avoid over-partitioning which leads to overfitting. CARTs tend to rely on cross-validation to reduce overfitting, but then one loses out on valuable training data. We propose a method that succeeds at generalizing without removing any data from the training set. We model this setup as a change point problem, where the change point is the index of an ordered dataset where the partitions are optimal.
A likelihood ratio test is used to determine the significance of each recurring optimal change point. We first study the one dimensional asymptotic distribution of the split location under the null hypothesis (that there is no change point).
Using a likelihood ratio statistic we recover the argmax of a Brownian bridge, which has an arcsine distribution, when the noise has finite variance. In the case where the noise has infinite variance, a stable-bridge limit results in an approximate Beta distribution approaching to uniformity as tails thicken.
The limiting distribution of the statistic is approximated by a Gumbel distribution that changes by an affine scaling as dimensionality grows. Across Gaussian and t-distributed~response variables, this method provides a solid method for partitioning datasets, while avoiding overfitting, and could be useful when regularizing regression trees.
