Minqi Jiang Seminar

AIMS Seminar - Friday 3rd February

An Introduction to Unsupervised Learning

Minqi Jiang (UCL Dark & Meta AI)

Abstract

Deep reinforcement learning (RL) agents commonly overfit to their training environments, performing poorly when the environment is even mildly perturbed. Such overfitting can be mitigated by conducting domain randomization (DR) over various aspects of the training environment in simulation. However, depending on implementation, DR makes potentially arbitrary assumptions about the distribution over environment instances. Moreover, DR may infrequently sample rare environment instances that may be useful for improving robustness. These issues make the benefits of DR difficult to anticipate.

Unsupervised Environment Design (UED) improves upon these shortcomings by directly considering the problem of automatically generating a sequence or curriculum of environment instances presented to the agent for training, in order to maximize the agent's final robustness and generality. Through the lens of minimax-regret decision theory and game theory, UED methods have been shown, in both theory and practice, to produce emergent training curricula that result in deep RL agents with improved generalization capabilities in terms of zero-shot transfer to out-of-distribution environment instances. This talk provides a primer on the key concepts underlying UED and presents a tour of recent algorithmic developments leading to increasingly powerful methods.

Bio

Minqi Jiang is a researcher at FAIR, Meta AI and final-year PhD student at UCL DARK, where he is advised by Prof. Tim Rocktäschel and Prof. Edward Grefenstette. Currently, he is a visiting researcher at the Foerster Lab for AI Reseach (FLAIR) led by Prof. Jakob Foerster at the University of Oxford. He is primarily interested in the design of machine learning systems that continually self-improve through actively generating or collecting training data. His work has primarily focused on such systems in the setting of deep reinforcement learning. Previously, he was a product manager at Google and the founder of a startup focused on human-in-the-loop automation, which was later acquired.