Joint with Data Science Distinguished Lecture
How to tame and validate AI-based predictions?
Statistical prediction is a very powerful engine: it underlies diffusion denoising for generating images, token prediction for large language models, and look-ahead prediction in reinforcement learning. Complex predictive models can behave remarkably well, but can also fail in mysterious and dangerous ways, including biased and mis-calibrated outputs, poor performance under distribution shift, and high uncertainty associated with their predictive performance. This talk discusses these challenges, along with some new approaches for tackling them. We describe a two-stage method for combining (potentially misleading) AI-generated data with clean data, and a synthetic data refitting method for assessing predictive performance.
Bio: Martin Wainwright is the Ford Chair Professor in the Department of Electrical Engineering and Computer Science and the Department of Mathematics at MIT. He is also affiliated with the Laboratory for Information and Decision Systems and Statistics and Data Science Center. He joined the MIT faculty in July 2022 after spending 20 years in statistics and EECS at the University of California at Berkeley. Martin is broadly interested in statistics and machine learning. His work has been recognized by various awards, among them a Guggenheim Foundation Fellowship and an Alfred P. Sloan Foundation Fellowship. the COPSS Presidents’ Award from the Joint Statistical Societies, the David Blackwell Award from the Institute of Mathematical Statistics, and a section Lecturer from the International Congress of Mathematicians. He has co-authored several books, including on graphical models, sparse statistical models, and high-dimensional statistics.