Zahra Alimoradzadeh: Asymptotic Analysis and Comparison of Model-Based and Model-Free Methods for the Linear Quadratic Regulator
Master Thesis presentation
Tid: To 2025-12-11 kl 08.30 - 09.30
Plats: Cramérrummet (mötesrum 12), Albano, Hus 1, Vån 3
Respondent: Zahra Alimoradzadeh
Handledare: Yishao Zhou (SU)
Abstract:
This thesis studies the asymptotic sample efficiency of model-based and model-free reinforcement learning algorithms in the Linear Quadratic Regulator (LQR) setting. We focus on the problem of policy evaluation under a fixed linear controller, where the value function is quadratic and characterized by the unique solution P* of a discrete-time Lyapunov equation.
Two estimators of P* are analyzed:
- A model-based plug-in estimator, which estimates the closed-loop dynamics via regularized least squares and substitutes the estimate into the Lyapunov operator, and
- A model-free estimator based on Least-Squares Temporal Difference (LSTD) learning, which directly estimates the quadratic value function from trajectory data.
We analyze policy evaluation in infinite-horizon LQR under a fixed stabilizing controller, comparing a model-based plug-in estimator of the Lyapunov solution with a model-free LSTD estimator. Using Markov chain Central Limit Theorems (CLTs), the Delta Method, and uniform integrability, we establish that the model-based estimator attains strictly smaller asymptotic risk than LSTD.
