Mathematics & Machine Learning Seminar
East Bridge 114
Polyak Steps Sizes in GD Find Flat Minima
Modern machine learning relies on minimizing high dimensional loss functions that are typically non-convex but for which it is still easy to find global minima. In fact, the set of global minima is often itself a high dimensional manifold, and an important question is which minima a given optimization scheme will find. In this talk I will present some ongoing joint work with Jason Altschuler (Penn) and Francesco Caporali (Princeton), which proves a new global convergence result for minimizing such functions. Namely, I will explain how gradient descent with Polyak step sizes provably finds flat minima. I will show that our theoretically devised optimizer finds flat minima empirically both in toy models and in pre-trained LLMs.
For more information, please contact Math Department by phone at 626-395-4335 or by email at [email protected].
Event Series
Mathematics and Machine Learning Seminar Series
Event Sponsors
