Organizers: Tapomayukh Bhattacharjee, Maya Cakmak, Dieter Fox, Siddhartha S. Srinivasa
Abstract: Dieter Fox will provide a short introduction to the work going on at the NVIDIA robotics lab, followed by talk on specific project. In this first part, we'll cover three projects. Clemens Eppner: A Billion Ways to Grasp: An Evaluation of Grasp Sampling Schemes on a Dense, Physics-based Grasp Data Set With the increasing speed and quality of physics simulations, generating large-scale grasping data sets that feed learning algorithms is becoming more and more popular. An often overlooked question is how to generate the grasps that make up these data sets. We review, classify, and compare different grasp sampling strategies based on a fine-grained discretization of SE(3) and physics-based simulation of the corresponding parallel-jaw grasps. Yu Xiang: PoseRBPF: A Rao-Blackwellized Particle Filter for 6D Object Pose Tracking Tracking 6D poses of objects from videos provides rich information to a robot in performing manipulation tasks. In this work, we formulate the 6D object pose tracking problem in the Rao-Blackwellized particle filtering framework, where the 3D rotation and the 3D translation of an object are decoupled. This factorization allows our approach, called PoseRBPF to efficiently estimate the 3D translation of an object along with the full distribution over the 3D rotation. This is achieved by discretizing the rotation space in a fine-grained manner, and training an auto-encoder network to construct a codebook of feature embeddings for the discretized rotations. As a result, PoseRBPF can track objects with arbitrary symmetries while still maintaining adequate posterior distributions. Our approach achieves state-of-the-art results on two 6D pose estimation benchmarks. Ankur Hand and Karl Van Wyk: DexPilot: Depth-Based Teleoperation of Dexterous Robotic Hand-Arm System Teleoperation imbues lifeless robotic systems with sophisticated reasoning skills, intuition, and creativity. However, current teleoperation solutions for high degree-of-actuation (DoA), multi-fingered robots are generally cost-prohibitive, while low-cost offerings usually offer reduced degrees of control. Herein, a low-cost, depth-based teleoperation system, DexPilot, is developed that allows for complete control over the full 23 DoA robotic system by merely observing the bare human hand. DexPilot enabled operators to solve a variety of complex manipulation tasks that go beyond simple pick-and-place operations and performance was measured through speed and reliability metrics. It cost-effectively enables the production of high dimensional, multi-modality, state-action data that can be leveraged in the future to learn sensorimotor policies for challenging manipulation tasks.
Abstract: In this second part, we'll cover two projects. Arsalan Mousavian: 6DoF GraspNet: Variational Grasp Generation for Object Manipulation Generating grasp poses is a crucial component for any robot object manipulation task. In this talk, I will present our latest work on grasping unknown objects from the point cloud. We formulate the problem of grasp generation as sampling a set of grasps using a variational autoencoder and assess and refine the sampled grasps using a grasp evaluator model. Both Grasp Sampler and Grasp Refinement networks take 3D point clouds observed by a depth camera as input. Our model is trained purely in simulation and works in the real world without any extra steps. Extensions of the work on other manipulation tasks will briefly discussed. Nathan Ratliff: Riemannian Motion Policies for Fast and Reactive Motion Generation In the modern era of collaborative robots fast reactions and adaptation to the uncertainties of human interaction is critical. I'll present our framework for quickly generating adaptive collision free behavior using what we call Riemannian Motion Policies (RMPs). Rather than relying on computationally intensive search processes to generate behaviors, policies are instead encoded compactly as part of the geometry of the curved space so that they naturally arise as geodesics (generalized straight lines). For instance, RMPs encode obstacle avoidance by modeling how obstacles warp their surroundings resulting in massive speedups over standard search- or optimization-based planning. I'll present the framework and demonstrate a number of real world deployments on a variety of manipulation platforms. I'll also present some recent work on incorporating the mathematical structure of RMPs into robot learning techniques to encourage generalization.
Abstract: Everyday tasks combine discrete and geometric decision-making. The robotics, AI, and formal methods communities have concurrently explored different planning approaches, producing techniques with different capabilities and trade-offs. We identify the combinatorial and geometric challenges of planning for everyday tasks, develop a hybrid planning algorithm, and implement an extensible planning framework. In ongoing work, we are extending this task-motion framework to uncertain and open world planning.
Biography: Neil T. Dantam is an Assistant Professor of Computer Science at the Colorado School of Mines. His research focuses on robot planning and manipulation, covering task and motion planning, quaternion kinematics, discrete policies, and real-time software design. Previously, Neil was a Postdoctoral Research Associate in Computer Science at Rice University working with Prof. Lydia Kavraki and Prof. Swarat Chaudhuri. Neil received a Ph.D. in Robotics from Georgia Tech, advised by Prof. Mike Stilman, and B.S. degrees in Computer Science and Mechanical Engineering from Purdue University. He has worked at iRobot Research, MIT Lincoln Laboratory, and Raytheon. Neil received the Georgia Tech President's Fellowship, the Georgia Tech/SAIC paper award, an American Control Conference '12 presentation award, and was a Best Paper and Mike Stilman Award finalist at HUMANOIDS '14.
Abstract: There is an essential tension between model-based and the model-free approaches to robot system design. Over the years, robotics research has produced many powerful models and algorithms for robot perception, state estimation, planning, control, etc . At the same time, model-free deep learning has recently brought unprecedented success in domains such as visual perception, object manipulation, ..., where model-based approaches struggle despite decades of research. In this talk, we will look at several ideas aimed at unifying model-based and model-free approaches to robot system construction. We embed well-known robot models and algorithms -- filters, planners, controllers -- in neural networks and train the networks end-to-end from data; as a result, we (i) improve the robustness of a model-based algorithm by learning a model optimized specifically for the algorithm and (ii) improve the data efficiency of learning by incorporating an algorithm as the structure prior. Further, the uniform network representation enables us to compose multiple system modules in a convenient and scalable manner through learning.
Biography: David Hsu is a professor of computer science at the National University of Singapore (NUS) and a member of NUS Graduate School for Integrative Sciences & Engineering. He is an IEEE Fellow. His research spans robotics and AI. In recent years, he has been working on robot planning and learning under uncertainty for human-centered robots. He received BSc in Computer Science & Mathematics from the University of British Columbia and PhD in computer science from Stanford University. At NUS, he co-founded NUS Advanced Robotics Center and has since been serving as the Deputy Director. He held visiting positions at MIT Aeronautics & Astronautics Department and CMU Robotics Institute. He has chaired or co-chaired several major international robotics conferences, including International Workshop on the Algorithmic Foundation of Robotics (WAFR) 2004 and 2010, Robotics: Science & Systems (RSS) 2015, and IEEE International Conference on Robotics & Automation (ICRA) 2016.
Abstract: Robotic agents that can accomplish manipulation tasks with the competence of humans have been the holy grail for AI and robotics research for more than 50 years. However, while the fields made huge progress over the years, this ultimate goal is still out of reach. I believe that this is the case because the knowledge representation and reasoning methods that have been proposed in AI so far are necessary but still too abstract. In this talk I propose to endow robots with the capability to mentally “reason with their eyes and hands'', that is to internally emulate and simulate their perception-action loops based on photo-realistic images and faithful physics simulations, which are made machine-understandable by casting them as virtual symbolic knowledge bases. These capabilities allow robots to generate huge collections of machine-understandable manipulation experiences, which they can then generalize into commonsense and intuitive physics knowledge applicable to open manipulation task domains. The combination of learning, representation, and reasoning will equip robots with an understanding of the relation between their motions and the physical effects they cause at an unprecedented level of realism, depth, and breadth, and enable them to master human-scale manipulation tasks. This breakthrough will be achievable by combining simulation and visual rendering technologies with mechanisms to semantically interpret internal simulation data structures and processes.
Biography: Michael Beetz is a professor for Computer Science at the Faculty for Mathematics & Informatics of the University Bremen and head of the Institute for Artificial Intelligence (IAI). He received his diploma degree in Computer Science with distinction from the University of Kaiserslautern. His MSc, MPhil, and PhD degrees were awarded by Yale University in 1993, 1994, and 1996, and his Venia Legendi from the University of Bonn in 2000. In February 2019 he received an Honorary Doctorate from Örebro University. He was vice-coordinator of the German cluster of excellence CoTeSys (Cognition for Technical Systems, 2006--2011), coordinator of the European FP7 integrating project RoboHow (web-enabled and experience- based cognitive robots that learn complex everyday manipulation tasks, 2012-2016), and is the coordinator of the German collaborative research centre EASE (Everyday Activity Science and Engineering, since 2017). His research interests include plan-based control of robotic agents, knowledge processing and representation for robots, integrated robot learning, and cognition-enabled perception.
Abstract: Barrett has been in the robotics business for 30 years, aimed at creating high-performance robot drive technologies – building blocks of articulated robotic, arms, hands, and legs. While Barrett is known for its haptically-enabled WAM® arm and 3-fingered BarrettHand™, Barrett also licensed its WAM-arm technology to MAKO Surgical for its Rio haptically-active robotic arm and to SensAble technology for its haptic PhanTom. Barrett has recently introduced two new products. The 3rd -generation Puck® (P3™), invented by Gill Pratt and Bill Townsend, is the world’s smallest brushless-servomotor drive by a factor of 1/6X. It is also a 12-bit-absolute encoder, eliminating all controller-to-encoder wires and connectors. Barrett’s other new product is Burt®, a haptically-enabled robotic arm that builds on the Puck and WAM. Burt is designed to leverage brain plasticity to reduce impairment from stroke, traumatic brain injury, and other neurological damage.
Biography: Bill Townsend is President & CEO of Barrett Technology, which he founded in 1988 to advance the state of human-machine interaction. He has a dozen issued patents in the US, Europe, and Japan and won the prestigious Robotic Industries Association Joseph Engelberger Award in Technology for his 1987 design of the first haptics-capable robot. This device (the “WAM® arm) was also chosen as the world’s most advanced robot in the Millennium Edition of the Guinness Book of Records. He earned his PhD and MS degrees in engineering at the Massachusetts Institute of Technology, Artificial Intelligence Laboratory (now CSAIL) and a BS in mechanical engineering at Northeastern University.
Abstract: Robots capable of collaborating with humans will bring transformative changes to the way we live and work. In domains ranging from healthcare to domestic tasks to manufacturing, particularly under conditions where modern automation is ineffective or inapplicable, robots have the potential to increase humans' efficiency, capability, and safety. Despite this, the deployment of collaborative robots into human-dominated environments remains largely infeasible due to challenges involved in ensuring that our autonomous teammates are helpful and safe. In this talk I will present an overview of my group's work toward overcoming these challenges, realizing adaptive, communicative robot collaborators that both learn and dynamically assist in the completion of complex tasks through the application of novel learning and control algorithms. In particular, I will be focusing on the importance of explainability and the human-interpretable models that underpin these methods, discussing new research targeting the establishment of shared expectations between humans and robots, helping to ensure safe and efficient operation in the domains of learning from demonstration and collaborative task execution.
Biography: Dr. Bradley Hayes is an Assistant Professor of Computer Science at the University of Colorado Boulder, where he directs the Collaborative AI and Robotics (CAIRO) Lab. He received his Ph.D. in Computer Science from Yale University advised by Brian Scassellati, and performed research as a postdoctoral associate at the MIT Interactive Robotics Group working with Julie Shah. His work focuses on developing novel explainable AI and interpretable machine learning techniques for safe task and motion planning with a focus on human-robot collaboration, enabling the creation of trustworthy autonomous teammates that efficiently learn from, capably work with, and demonstrably improve the performance of human partners.
Abstract: We present Vision-based Navigation with Language-based Assistance (VNLA), a grounded vision-language task where an agent with visual perception is guided via language to find objects in photorealistic indoor environments. The task emulates a real-world scenario in that (a) the requester may not know how to navigate to the target objects and thus makes requests by only specifying high-level endgoals, and (b) the agent is capable of sensing when it is lost and querying an advisor, who is more qualified at the task, to obtain language subgoals to make progress. To model language-based assistance, we develop a general framework termed Imitation Learning with Indirect Intervention (I3L), and propose a solution that is effective on the VNLA task. Empirical results show that this approach significantly improves the success rate of the learning agent over other baselines in both seen and unseen environments. Our code and data are publicly available at https://github.com/debadeepta/vnla. Time permitting I will present very recent results on stabilizing imitation learning (with and without action labels) using the IL-as-f-divergence framework of Ke et al. via suitable reward reparameterizations.
Biography: Debadeepta Dey is a researcher at MSR, Redmond working closely with the Reinforcement Learning groups at MSR Redmond and MSR NYC. He obtained his PhD at the Robotics Institute, Carnegie Mellon University and conducts fundamental as well as applied research in machine learning, control and computer vision with applications to autonomous agents in general and robotics in particular. His interests include decision-making under uncertainty, reinforcement learning, and machine learning.
Abstract: Traditional convolutional networks exhibit unprecedented robustness to intraclass nuisances when trained on big data. However, such data have to be augmented to cover geometric transformations. Several approaches have shown recently that data augmentation can be avoided if networks are structured such that feature representations are transformed the same way as the input, a desirable property called equivariance. We show in this talk that global equivariance can be achieved for the case of 2D scaling, rotation, and translation as well as 3D rotations. We show state of the art results using an order of magnitude lower capacity than competing approaches. Moreover, we show how such geometric embeddings can recover the 3D pose of objects without keypoints or using ground-truth pose on regression. We finish by showing how graph convolutions enable the recovery of human pose and shape without any 2D annotations.
Biography: Kostas Daniilidis is the Ruth Yalom Stone Professor of Computer and Information Science at the University of Pennsylvania where he has been faculty since 1998. He is an IEEE Fellow. He was the director of the GRASP laboratory from 2008 to 2013, Associate Dean for Graduate Education from 2012-2016, and Faculty Director of Online Learning 2012-2017. He obtained his undergraduate degree in Electrical Engineering from the National Technical University of Athens, 1986, and his PhD in Computer Science from the University of Karlsruhe, 1992. He is co-recipient of the Best Conference Paper Award at ICRA 2017 and Best Paper Finalist at IEEE CASE 2015, RSS 2018, and CVPR 2019. Kostas’ main interest today is in geometric deep learning, event-based cameras, and action representations as applied to vision based manipulation and navigation.
Abstract: At 27, Dr. Ayanna Howard was hired by NASA to lead a team designing a robot for future Mars exploration missions that could "think like a human and adapt to change." Her accomplishments since then include being named as one of 2015's most powerful women engineers in the world and as one of Forbes' 2018 U.S. Top 50 Women in Tech. From creating robots to studying the impact of global warming on the Antarctic ice shelves to founding a company that develops STEM education and therapy products for children and those with varying needs, Professor Howard focuses on our role in being responsible global citizens. In this talk, Professor Howard will delve into the implications of recent advances in robotics and AI and explain the critical importance of ensuring diversity and inclusion at all stages to reduce the risk of unconscious bias and ensuring robots are designed to be accessible to all. Throughout the talk, Professor Howard will weave in her own experience on developing new AI technologies through her technical leadership roles at NASA, Georgia Tech, and in technology startups.
Biography: Ayanna Howard, Ph.D. is the Linda J. and Mark C. Smith Professor and Chair of the School of Interactive Computing at the Georgia Institute of Technology. She also holds a faculty appointment in the School of Electrical and Computer Engineering and functions as the Chief Technology Officer of Zyrobotics. Dr. Howard's career focus is on intelligent technologies that must adapt to and function within a human-centered world. Her work, which encompasses advancements in artificial intelligence (AI), assistive technologies, and robotics, has resulted in over 250 peer-reviewed publications in a number of projects -- from healthcare robots in the home to AI-powered STEM apps for children with diverse learning needs. To date, her unique accomplishments have been highlighted through a number of awards and articles, including highlights in USA Today, Upscale, and TIME Magazine, as well as being recognized as one of the 23 most powerful women engineers in the world by Business Insider and one of the Top 50 U.S. Women in Tech by Forbes. In 2013, she also founded Zyrobotics, which is currently licensing technology derived from her research and has released their first suite of STEM educational products to engage children of all abilities. Prior to Georgia Tech, Dr. Howard was a Senior Robotics Researcher and Deputy Manager in the Office of the Chief Scientist at NASA's Jet Propulsion Laboratory. She has also served as the Associate Director of Research for the Institute for Robotics and Intelligent Machines, Chair of the Robotics Ph.D. program, and the Associate Chair for Faculty Development in the School of Electrical and Computer Engineering at Georgia Tech.
Abstract: Human-robot collaboration has the potential to transform the way people work and live. To be effective assistants, these robots must be able to recognize aspects of their human partners such as what their goals are, what their next action will be, and when they need help---in short, their task-relevant mental states. A large part of communication about mental states occurs nonverbally, through eye gaze, gestures, and other behaviors that provide implicit information. Therefore, to be effective collaborators, robots must understand and respond to nonverbal human communication. This requires a multidisciplinary approach that involves robotics, psychology, machine learning, and computer vision. In this talk, I will describe my work on robots that collaborate with and assist humans on complex tasks, such as eating a meal. I will show how natural, intuitive human behaviors can reveal human mental states that robots must respond to. Throughout the talk, I will describe how techniques and knowledge from cognitive science help us develop robot algorithms that lead to more effective interactions between people and their robot partners.
Biography: Henny Admoni is an Assistant Professor in the Robotics Institute at Carnegie Mellon University, where she leads the Human And Robot Partners (HARP) Lab. Henny studies how to develop intelligent robots that can assist and collaborate with humans on complex tasks like preparing a meal. She is most interested in how natural human communication, like where someone is looking, can reveal underlying human intentions and can be used to improve human-robot interactions. Henny's research has been supported by the US National Science Foundation, the US Office of Naval Research, the Paralyzed Veterans of America Foundation, and Sony Corporation. Her work has been featured by the media such as NPR's Science Friday, Voice of America News, and WESA radio. Previously, Henny was a postdoctoral fellow at CMU with Siddhartha Srinivasa in the Personal Robotics Lab. Henny completed her PhD in Computer Science at Yale University, and a BA/MA joint degree in Computer Science from Wesleyan University.