Abstract:
Robotic control policies learned from human demonstrations have achieved impressive results in many real-world applications. However, in scenarios where initial performance is not satisfactory, as is often the case in novel open-world settings, such behavioral cloning (BC)-learned policies typically require collecting additional human demonstrations to further improve their behavior—an expensive and time-consuming process. In contrast, reinforcement learning (RL) holds the promise of enabling autonomous online policy improvement, but often falls short of achieving this due to the large number of samples it typically requires. In this talk we take steps towards enabling fast autonomous adaptation of BC-trained policies via efficient real-world RL. We consider two angles on this problem. First, focusing in particular on diffusion policies, we propose diffusion steering via reinforcement learning (DSRL): adapting the BC policy by running RL over its latent-noise space. We show that DSRL is highly sample efficient, requires only black-box access to the BC policy, and enables effective real-world autonomous policy improvement. Second, we consider the role of the pretrained policy itself in RL improvement, and ask how we might pretrain policies that are amenable to downstream improvement. We show that standard BC pretraining can produce policies which fail to meet minimal conditions necessary for effective finetuning—coverage over the demonstrator’s actions—but that, if we instead fit a policy to the posterior of the demonstrator’s behaviors, we can achieve action coverage while ensuring the performance of the pretrained policy is no worse than that of the BC policy. We show experimentally that such posterior BC-pretrained policies enable much more efficient online improvement than standard BC-pretrained policies.
Biography:
Andrew Wagenmaker is a postdoctoral researcher in Electrical Engineering and Computer Science at UC Berkeley working with Sergey Levine. Previously, he completed a PhD in Computer Science at the University of Washington, where he was advised by Kevin Jamieson. While in graduate school, he also spent time at Microsoft Research, mentored by Dylan Foster, as well as the Simons Institute, and his work was supported by an NSF Graduate Research Fellowship. Before that, he completed a master's and bachelor's degree at the University of Michigan, both in Electrical Engineering. His research centers on developing learning-based algorithms for decision-making in sequential environments, both in theory and practice