Statistical Learning with Neural Networks Trained by Gradient Descent

Spencer Frei
PhD, 2021
Wu, Yingnian
In this thesis, we theoretically analyze the ability of neural networks trained by gradient descent to learn. The learning problem consists of an algorithmic component and a statistical component. The algorithmic question concerns the underlying optimization problem: given samples from a distribution, under what conditions can a neural network trained by gradient descent efficiently minimize the empirical risk for some loss function defined over these samples? As the underlying optimization problem is highly non-convex, standard tools from optimization theory are not applicable and thus a novel analysis is needed. The statistical question concerns the generalization problem: supposing gradient descent is successful at minimizing the empirical risk, under what conditions does this translate to a guarantee for the population risk? Contemporary neural networks used in practice are highly overparameterized and are capable of minimizing the empirical risk even when the true labels are replaced with random noise, and thus standard uniform convergence-based arguments will fail to yield meaningful guarantees for the population risk for these models.
2021