Autumn 2024 Colloquium

Organizers: Jerry Savage, Abhishek Gupta, Maya Cakmak, Josh Smith

Robot Learning with Minimal Human Feedback
Erdem Biyik (Thomas Lord Department of Computer Science at the University of Southern California) 10/11/2024

Abstract: The lack of large robotics datasets is arguably the most important obstacle in front of robot learning. While large pretrained models and algorithms like reinforcement learning from human feedback led to breakthroughs in other domains like language and vision, robotics has not experienced such a significant influence due to the excessive cost of collecting large datasets. In this talk, I will discuss techniques that enable us to train robots from very little human feedback, as little as one demonstration or one language instruction, or their natural eye gaze. I will dive into reinforcement learning from human feedback, and propose an alternative type of human feedback based on language corrections to improve data-efficiency. I will finalize my talk by presenting how existing large pretrained vision-language models can be used to generate direct supervision for robot learning.

Biography: Erdem Bıyık is an assistant professor in Thomas Lord Department of Computer Science at the University of Southern California, and in Ming Hsieh Department of Electrical and Computer Engineering by courtesy. He leads the Learning and Interactive Robot Autonomy Lab (Lira Lab). Prior to joining USC, he was a postdoctoral researcher at UC Berkeley's Center for Human-Compatible Artificial Intelligence. He received his Ph.D. and M.Sc. degrees in Electrical Engineering from Stanford University, working at the Stanford Artificial Intelligence Lab (SAIL), and his B.Sc. degree in Electrical and Electronics Engineering from Bilkent University in Ankara, Türkiye. During his studies, he worked at the research departments of Google and Aselsan. Erdem was an HRI 2022 Pioneer and received an honorable mention award for his work at HRI 2020. His works were published at premier robotics and artificial intelligence journals and conferences, such as IJRR, CoRL, RSS, NeurIPS.

Beyond Rules and Rewards: From Pixels to Adaptive Policies through Imitation, Reinforcement, and Multimodal Learning
Kiana Ehsani (Senior Research Scientist at PRIOR and the Allen Institute for AI) 10/18/2024

Abstract: As robotics progresses, the need for more adaptable and generalizable learning approaches becomes critical. In this talk, we explore the trajectory of robotic learning methodologies, starting from data generation through imitation to the integration of multimodal systems. We begin by examining approaches that leverage simulation to generate training data, wherein robots imitate optimal behaviors to acquire skills that generalize effectively to real-world environments. These methods form the basis for advancing reinforcement learning architectures, particularly transformer-based policies, which demonstrate significant improvements in robotic navigation and decision-making tasks. To address the inherent limitations of isolated learning paradigms, we explore the synergy between imitation learning and reinforcement learning. This hybrid approach enables the fine-tuning of learned behaviors, allowing for adaptation to new tasks and environments where sparse rewards challenge traditional reinforcement learning models. Moreover, we consider the challenge of embodiment, presenting techniques that allow a single policy to generalize across multiple robot configurations, pushing the boundaries of what can be achieved with unified policies. We conclude by addressing the limitations of templated task descriptions in robotics, and the role of vision-language models (VLMs) in advancing toward true open-world understanding. By training and leveraging VLMs that are designed with robotics tasks in mind, we enable robots to interpret and act upon flexible, open-ended instructions, moving beyond constrained environments and objects toward real-world applications involving novel objects and tasks. These contributions collectively represent a step forward in developing robots capable of more intuitive and versatile interaction within dynamic and unstructured environments.

Biography: Kiana Ehsani is a Senior Research Scientist at the Allen Institute for AI (PRIOR), specializing in Embodied AI and robotics. She earned her PhD in Computer Science from the University of Washington under the guidance of Ali Farhadi. Kiana’s research spans computer vision, machine learning, and AI, with a particular focus on enabling robots to perceive and manipulate their environments using visual and multimodal inputs. Her contributions have been presented at leading conferences such as CVPR, NeurIPS, CoRL, and ICRA. In addition to her technical achievements, Kiana is dedicated to pushing the boundaries of robotic manipulation and multimodal AI.

GTSophy: Outracing champion Gran Turismo drivers with deep reinforcement learning
Ishan Durugkar (Research Scientist at Sony AI) 10/29/2024

Abstract: Many potential applications of artificial intelligence involve making real-time decisions in physical systems while interacting with humans. Automobile racing represents an extreme example of these conditions; drivers must execute complex tactical manoeuvres to pass or block opponents while operating their vehicles at their traction limits. Racing simulations, such as the PlayStation game Gran Turismo, faithfully reproduce the non-linear control challenges of real race cars while also encapsulating the complex multi-agent interactions. This talk describes how we trained agents for Gran Turismo that can compete with the world's best e-sports drivers. This agent, Gran Turismo Sophy, was evaluated in a head-to-head competition against four of the world's best Gran Turismo drivers and won. Thereafter, this agent was brought to production, with GT Sophy now available for all players to play on multiple tracks, across hundreds of cars, on their personal PlayStation 5. I will go over this journey and some of the challenges ahead.

Biography: Ishan Durugkar is a research scientist at Sony AI working with the Game AI team to bring reinforcement learning agents to games such as Gran Turismo. He completed his PhD from UT Austin, where he focused on reinforcement learning, robotics, and multi-agent systems. Most notably Ishan has used distribution matching techniques for goal-conditioned reinforcement learning, unsupervised skill discovery, sim-to-real policy transfer, and multi-agent coordination.

"Trace"-ing the Path to Self-adapting AI Agents
Ching-An Cheng (Senior Researcher in MSR AI Frontiers) 11/01/2024

Abstract: What is Trace? Trace is a new AutoDiff-like framework for training AI workflows end-to-end with general feedback (like numerical rewards or losses, natural language text, compiler errors, etc.). Trace generalizes the back-propagation algorithm by capturing and propagating an AI workflow execution trace and applies LLM-based optimization to improve the workflow’s performance.Trace is implemented as a PyTorch-like Python library and is compatible with any Python workflow. Users write Python code directly and can use Trace primitives to optimize certain parts (like codes, prompts, etc.), just like training neural networks! In this talk, I will discuss insights behind designing Trace and showcase what Trace can do in training AI agents.

Biography: Ching-An Cheng is a Senior Researcher in MSR AI Frontiers. He received PhD in Robotics in 2020 from Georgia Tech, where he was advised by Byron Boots at Institute for Robotics and Intelligent Machines. He's a practical theoretician who is interested in developing foundations for designing principled algorithms that can tackle real-world challenges. Ching-An's research studies structural properties in sequential decision making problems, especially in robotics, and aims to improve the learning efficiency of autonomous agents. His recent works focus on developing agents that can learn from general feedback, which unifies Learning from Language Feedback (LLF), reinforcement learning (RL), and imitation learning (IL). Ching-An's research has received several awards, including Outstanding Paper Award, Runner-Up (ICML 2022) and Best Paper (AISTATS 2018).

Concept Learning for Interpretable and Efficient Robotic Agents
Manav Kulshrestha (Ph.D. Candidate at Purdue University) 11/08/2024

Abstract: Real-world autonomous systems must often operate in dynamic and unstructured environments — conditions that closely reflect the complexity and unpredictability of the physical world. Unlike classical approaches employing hand-crafted models, deep learning has enabled robotic systems to relax their assumptions about the environment by allowing them to utilize large amounts of observed information to make adaptive decisions. However, the inherent black-box nature of deep learning methods poses significant challenges for interpretability — a critical concern, especially for systems that operate collaboratively with humans in safety-critical or time-sensitive domains. The ability for humans to comprehend the reasoning, underlying beliefs, and potential decision trajectories of their robotic counterparts becomes essential for establishing trust, ensuring effective human-robot interaction, and maintaining operational safety across diverse contexts.

Biography: Manav Kulshrestha is a Ph.D. Candidate at Purdue University, advised by Prof. Aniket Bera and affiliated with the Intelligent Design for Empathetic and Augmented Systems (IDEAS) Lab where he works on novel techniques for scene understanding and representation to build interpretable models for robotics. Before Purdue, he received dual degrees in Computer Science and Mathematics from the University of Massachusetts at Amherst.

Human-Centered AI for Accessible and Assistive Robotics: Towards a Disability-Centered HRI
Elaine Short (Assistant Professor of Computer Science at Tufts) 11/15/2024

Abstract: Powered by advances in AI, especially machine learning, robots are becoming smarter and more widely used. Robots can provide critical assistance to people in a variety of contexts, from improving the efficiency of workers to helping people with disabilities in their day-to-day lives. However, inadequate attention to the needs of users in developing these intelligent robots results in systems that are both less effective at their core tasks and more likely to do unintended harm. The Assistive Agent Behavior and Learning (AABL) Lab at Tufts University seeks to apply human-centered design thinking, especially disability ethics, to the design of state-of-the-art robot learning algorithms and interaction frameworks. This talk will explore how disability-community-centered thinking can be used to inspire new directions for intelligent interactive robotics and review recent work from the AABL lab at the intersection of assistive robotics, robot learning, and human-robot interaction.

Biography: Elaine Schaertl Short is an Assistant Professor of Computer Science at Tufts University. She holds a PhD and MS in Computer Science at the University of Southern California (USC) and a BS in Computer Science from Yale University. Her research applies human-centered design and disability community values to the development, deployment, and evaluation of AI and machine learning for robotics, including: human-centered human-in-the-loop machine learning; disability-friendly assistive robotics; autonomous HRI in groups, public spaces, and other human-human contexts; and accessibility and disability inclusion in robotics education and the computing research community. She is as committed to human-centered research practices as she is to algorithm and robot design: her work spans from designing a low-cost open-source open-hardware robot platform, to understanding family group interactions with socially assistive robots, to designing new neural network architectures for improving human-in-the-loop robot learning. As a disabled faculty member, Elaine is particularly passionate about disability rights in her service work. She is a co-PI of AccessComputing and co-Chair of AccessSIGCHI, both organizations that work to increase the accessibility of the field of computing and the representation of people with disabilities in the computing field.

Boeing Advanced Research Collaboration
Shuonan Dong (Associate Director, BARC Lab) 11/22/2024

Abstract: The Boeing Advanced Research Collaboration (BARC) is a strategic partnership between UW and Boeing in aerospace manufacturing. The BARC brings together Boeing engineers and UW faculty and students to work on joint projects that address the challenges and opportunities of aircraft manufacturing and make airplane fabrication and assembly more efficient, intelligent, automated, and streamlined. Example research areas include composites manufacturing, confined space robotics, scan to plan applications, non-destructive inspection, ML for flexible structures, laser ablation, ergonomics, and more. This talk will cover the history and available resources of the BARC, and highlight a few select projects.

Biography: Shuonan Dong is an Associate Technical Fellow at The Boeing Company, leading advanced manufacturing automation and robotics technology concepts in Commercial Airplanes Product Development. She is also the Associate Director of the UW Boeing Advanced Research Collaboration lab, and an Affiliate Assistant Professor in the ME Department. Shannon holds BS, MS, and PhD degrees from MIT in the Aeronautics & Astronautics Department and Computer Science & Artificial Intelligence Lab.

A Sim-to-Real Journey for Contact-Rich Manipulation
Yashraj Narang (Research Manager at NVIDIA) 12/06/2024

Abstract: In this talk, I will present our efforts at NVIDIA to enable sim-to-real transfer for contact-rich manipulation, with a particular focus on robotic assembly. I will begin by discussing Factory, a project to enable fast, accurate contact simulation for highly-detailed assets, such as those found in industrial settings. I will then discuss IndustReal and AutoMate, our recent projects to train RL policies for robotic assembly in simulation and transfer these policies in zero-shot to the real world. I will then progress to Forge and TacSL, projects that add force and visuotactile sensing to our sim-to-real framework to improve robustness and efficiency. Finally, I will briefly touch on our ongoing work and present a vision for our future research, which will focus on building a foundation skill for robotic assembly.

Biography: Yashraj “Yash” Narang is a research manager in the NVIDIA Seattle Robotics Lab (SRL), directed by Dieter Fox. Yash leads the Simulation and Behavior Generation (SBG) team, which focuses on improving and leveraging physics simulation for robotic reinforcement learning, imitation learning, and sim-to-real transfer, as well as systems that integrate learning and planning. Previously, Yash focused on leveraging simulation for robotic sensing, grasping, and manipulation. Prior to NVIDIA, Yash completed his PhD research in solid and structural mechanics for highly-deformable robots and sensors and also worked on precision machine design, prosthetic knee design, and cardiovascular mechanics.