Winter 2023 Colloquium

Organizers: Selest Nashef, Josh Smith, Byron Boots, Maya Cakmak, Dieter Fox, Abhishek Gupta, Siddhartha S. Srinivasa

Learning Meets Gravity: Robots that Embrace Dynamics from Pixels
Shuran Song (Columbia University) 01/13/2023

Abstract: Despite the incredible capabilities (speed, repeatability) of our hardware, most robot manipulators today are deliberately programmed to avoid dynamics – moving slow enough so they can adhere to quasi-static assumptions about the world. In contrast, people frequently (and subconsciously) make use of dynamic phenomena to manipulate everyday objects – from unfurling blankets to tossing trash, to improve efficiency and physical reach range. These abilities are made possible by an intuition of physics, a cornerstone of intelligence. How do we impart the same on robots? In this talk, I will discuss how we might enable robots to leverage dynamics for manipulation in unstructured environments. Modeling the complex dynamics of unseen objects from pixels is challenging. However, by tightly integrating perception and action, we show it is possible to relax the need for accurate dynamical models. Thereby, allowing robots to (i) learn dynamic skills for complex objects, (ii) adapt to new scenarios using visual feedback, and (iii) use their dynamic interactions to improve their understanding of the world. By changing the way we think about dynamics – from avoiding it to embracing it – we can simplify a number of classically challenging problems, leading to new robot capabilities.

Biography: Shuran Song is an Assistant Professor in the Department of Computer Science at Columbia University. Before that, she received her Ph.D. in Computer Science at Princeton University, BEng. at HKUST. Her research interests lie at the intersection of computer vision and robotics. Song’s research has been recognized through several awards including the Best Paper Awards at RSS’22 and T-RO’20, Best System Paper Awards at CoRL’21, RSS’19, and Amazon Robotics’18, and finalist at RSS’22, ICRA'20, CVPR'19, RSS’19, and IROS'18. She is also a recipient of the NSF Career Award, as well as research awards from Microsoft, Toyota Research, Google, Amazon, JP Morgan, and the Sloan Foundation. To learn more about Shuran’s work please visit:

CANCELED 01/20/2023
Placing Items on Cluttered Shelves for E-Commerce Fulfillment
Aaron Parness (Amazon Robotics) 01/27/2023

Abstract: Stowing inventory is one of the most expensive tasks in e-commerce. We present robotic science and technology that targets this application. The system uses compliant control to operate through many physical contacts between the robot and items already on shelves. Perception algorithms infer available space from shelf images and the item manifest to plan and control compliant manipulation strategies. Custom end of arm tools (grippers) simplify the tasks.

Biography: Dr. Aaron Parness is a Sr Applied Science Manager in Robotics and Artificial Intelligence at Amazon where he focuses on e-commerce fulfillment. His team builds robotic workcells that use force sensors to perform tasks in highly cluttered situations where contact with the environment is unavoidable. With expertise in robotic gripper design, motion planning, control, computer vision, and machine learning, Aaron leads a team of 75 staff in Seattle, Boston, Berlin, and remote locations. From 2010 to 2019, he worked at NASA’s Jet Propulsion Laboratory where he founded and led the Robotic Rapid Prototyping Laboratory. He received two bachelor’s degrees from MIT (Mechanical Engineering & Creative Writing), and his MS and PhD from Stanford University.

Designing for the Human in Human-Robot Interaction
Matthew C Gombolay (Georgia Institute of Technology) 02/03/2023

Abstract: New advances in robotics and autonomy offer a promise of revitalizing final assembly manufacturing, assisting in personalized at-home healthcare, and even scaling the power of earth-bound scientists for robotic space exploration. Yet, in real-world applications, autonomy is often run in the O-F-F mode because researchers fail to understand the human in human-in-the-loop systems. In this talk, I will share exciting research we are conducting at the nexus of human factors engineering and cognitive robotics to inform the design of human-robot interaction. In my talk, I will focus on our recent work on 1) enabling machines to learn skills from and model heterogeneous, suboptimal human decision-makers, 2) “white-box” that knowledge through explainable Artificial Intelligence (XAI) techniques, and 3) scale to coordinated control of stochastic human-robot teams. The goal of this research is to inform the design of autonomous teammates so that users want to turn – and benefit from turning – to the O-N mode.

Biography: Dr. Matthew Gombolay is an Assistant Professor of Interactive Computing at the Georgia Institute of Technology. He was named the Anne and Alan Taetle Early-career Assistant Professor in 2018. He received a B.S. in Mechanical Engineering from the Johns Hopkins University in 2011, an S.M. in Aeronautics and Astronautics from MIT in 2013, and a Ph.D. in Autonomous Systems from MIT in 2017. Between defending his dissertation and joining the faculty at Georgia Tech, Dr. Gombolay served as technical staff at MIT Lincoln Laboratory, transitioning his research to the U.S. Navy and earning a R&D 100 Award. His publication record includes a best paper award from the American Institute for Aeronautics and Astronautics and the ACM/IEEE Conference on Human-Robot Interaction (HRI’22) as well as finalist awards for the best paper at the Conference on Robot Learning (CoRL’20) and best student paper at the American Controls Conference (ACC’20). Dr. Gombolay was selected as a DARPA Riser in 2018, received the Early Career Award from the National Fire Control Symposium, and was awarded a NASA Early Career Fellowship. Dr. Gombolay is an Associate Editor for Autonomous Robots and the ACM Transitions on Human-Robot Interaction.

Modeling the 3D Physical World for Embodied Intelligence
Hao Su (UCSD) 02/10/2023

Abstract: Embodied AI is a rising paradigm of AI that targets enabling agents to interact with the physical world. Embodied agents can acquire a large amount of data through interaction with the physical world, which makes it possible for agents to close the perception-cognition-action loop and learn continuously from the world to revise its internal model. In this talk, I will present the work of my group to build an eco-system on embodied AI. The series of work unifies efforts from building a virtual space for interaction data collection to proposing effective closed-loop learning algorithms. We will also discuss the challenges and opportunities in this area.

Biography: Hao Su is an Assistant Professor of Computer Science at the University of California, San Diego. He is the Director of the Embodied AI Lab at UCSD, a founding member of the Data Science Institute, and a member of the Center for Visual Computing and the Contextural Robotics Institute. He works on algorithms to model, undertand, and interact with the physical world. His interests span computer vision, machine learning, computer graphics, and robotics -- all areas in which he has published and lectured extensively. Hao Su obtained his Ph.D. from Stanford in 2018. At Stanford and UCSD he developed widely used datasets and softwares such as ImageNet, ShapeNet, PointNet, PartNet, SAPIEN, and more recently, ManiSkill. He also developed new courses to promote machine learning methods for 3D geometry and embodied AI. He served as the Area Chair or Associate Editor for top conferences and journals in computer vision (ICCV/ECCV/CVPR), computer graphics (SIGGRAPH/ToG), robotics (IROS/ICRA), and machine learning (ICLR).

Hybrid RL: Using Both Offline and Online Data can Make RL Efficient
Wen sun (Cornell University, Computer Science Department) 02/17/2023

Abstract: We consider a hybrid reinforcement learning setting (Hybrid RL), in which an agent has access to an offline dataset and the ability to collect experience via real-world online interaction. The framework mitigates the challenges that arise in both pure offline and online RL settings, allowing for the design of simple and highly effective algorithms, in both theory and practice. We demonstrate these advantages by adapting the classical Q learning/iteration algorithm to the hybrid setting, which we call Hybrid Q-Learning or Hy-Q. In our theoretical results, we prove that the algorithm is both computationally and statistically efficient whenever the offline dataset supports a high-quality policy and the environment has bounded bilinear rank. Notably, we require no assumptions on the coverage provided by the initial distribution, in contrast with guarantees for policy gradient/iteration methods. In our experimental results, we show that Hy-Q with neural network function approximation outperforms state-of-the-art online, offline, and hybrid RL baselines on challenging benchmarks, including Montezuma’s Revenge. This is a joint work with Yuda Song, Yifei Zhou, Ayush Sikhari, J. Andrew Bagnell, and Akshay Krishnamurthy.

Biography: Wen Sun is an assistant professor in the Computer Science Department at Cornell University. His group works on machine learning, especially reinforcement learning. The most recent research directions of his group include representation learning in RL, learning in partially observable systems, sample efficient learning from demonstrations, generalization in RL with rich function approximation, and deep RL. His group has published many influential papers on the top conferences of machine learning, including ICML, NeurIPS, ICLR, COLT, CVPR, AISTATS, UAI, etc. He serves as the Area Chair of ICML and NeurIPS, and is the recipient of the UAI Best Student Paper Award. He is the co-author of the first theoretical RL monograph (available at

Doing for our robots what nature did for us
Leslie Kaelbling (MIT Computer Science and Artificial Intelligence Lab (CSAIL)) 02/24/2023

Abstract: We, as robot engineers, have to think hard about our role in the design of robots and how it interacts with learning, both in "the factory" (that is, at engineering time) and in "the wild" (that is, when the robot is delivered to a customer). I will share some general thoughts about the strategies for robot design and then talk in detail about some work I have been involved in, both in the design of an overall architecture for an intelligent robot and in strategies for learning to integrate new skills into the repertoire of an already competent robot.

Biography: Leslie is a Professor at MIT. She has an undergraduate degree in Philosophy and a PhD in Computer Science from Stanford, and was previously on the faculty at Brown University. She was the founding editor-in-chief of the Journal of Machine Learning Research. Her research agenda is to make intelligent robots using methods including estimation, learning, planning, and reasoning. She is not a robot.

CANCELED 03/03/2023
Life-long and Robust Learning from Robotic Fleets
Sandeep Chinchali (The University of Texas at Austin (UT Austin)) 03/10/2023

Abstract: Today’s robotic fleets collect terabytes of rich video and LiDAR data that can be used to continually re-train machine learning (ML) models in the cloud. While these fleets should ideally upload all their data to train robust ML models, this is often infeasible due to prohibitive network bandwidth, data labeling, and cloud costs. In this talk, I will present my group’s papers at CORL 2022 that aim to learn robust perception models from geo-distributed robotic fleets. First, I will present a cooperative data sampling strategy for autonomous vehicles (AVs) to collect a diverse ML training dataset in the cloud. Since the AVs have a shared objective but minimal information about each other's local data distributions, we can naturally cast cooperative data collection as a mathematical game. I will theoretically characterize the convergence and communication benefits of game-theoretic data sampling and show state-of-the-art performance on standard AV datasets. Then, I will transition to our work on synthesizing robust perception models tailored to robotic control tasks. The key insight is that today's methods to train robust perception models are largely task-agnostic - they augment a dataset using random image transformations or adversarial examples targeted at a vision model in isolation. However, I will show that accounting for the structure of an ultimate robotic task, such as differentiable model predictive control, can improve the generalization of perception models. Finally, I will conclude by tying these threads together into a broader vision on robust, continual learning from networked robotic fleets.

Biography: "Sandeep Chinchali is an assistant professor in UT Austin’s ECE department. He completed his PhD in computer science at Stanford and undergrad at Caltech, where he researched at NASA JPL. Previously, he was the first principal data scientist at Uhana, a Stanford startup working on data-driven optimization of cellular networks, now acquired by VMWare. Sandeep’s research on cloud robotics, edge computing, and 5G was recognized with the Outstanding Paper Award at MLSys 2022 and was a finalist for Best Systems Paper at Robotics: Science and Systems 2019. His group is funded by companies such as Lockheed Martin, Honda, Viavi, Cisco, and Intel and actively collaborates with local Austin startups."