Winter 2016 Colloquium

Organizers: Kendall Lowrey, Patrick Lancaster, Dieter Fox

Combining the benefits of function approximation and trajectory optimization
Kendall Lowrey (UW CSE) 01/15/2016

Abstract: Neural networks have recently solved many hard problems in Machine Learning, but their impact in control remains limited. Trajectory optimization has recently solved many hard problems in robotic control, but using it online remains challenging. Here we leverage the high-fidelity solutions obtained by trajectory optimization to speed up the training of neural network controllers. The two learning problems are coupled using the Alternating Direction Method of Multipliers (ADMM). This coupling enables the trajectory optimizer to act as a teacher, gradually guiding the network towards better solutions. We develop a new trajectory optimizer based on inverse contact dynamics, and provide not only the trajectories but also the feedback gains as training data to the network. The method is illustrated on rolling, reaching, swimming and walking tasks.

Information-Theoretic Planning with Trajectory Optimization for Dense 3D Mapping
Vladimir Korukov (UW CSE) 01/15/2016

Abstract: We propose an information-theoretic planning approach that enables mobile robots to autonomously construct dense 3D maps in a computationally efficient manner. Inspired by prior work, we accomplish this task by formulating an information-theoretic objective function based on CauchySchwarz quadratic mutual information (CSQMI) that guides robots to obtain measurements in uncertain regions of the map. We then contribute a two stage approach for active mapping. First, we generate a candidate set of trajectories using a combination of global planning and generation of local motion primitives. From this set, we choose a trajectory that maximizes the information-theoretic objective. Second, we employ a gradientbased trajectory optimization technique to locally refine the chosen trajectory such that the CSQMI objective is maximized while satisfying the robot’s motion constraints. We evaluated our approach through a series of simulations and experiments on a ground robot and an aerial robot mapping unknown 3D environments. Real-world experiments suggest our approach reduces the time to explore an environment by 70% compared to a closest frontier exploration strategy and 57% compared to an information-based strategy that uses global planning, while simulations demonstrate the approach extends to aerial robots with higher-dimensional state.

Pose Estimation of Kinematic Chain Instances via Object Coordinate Regression
Tanner Schmidt (UW CSE) 01/22/2016

Abstract: In this paper, we address the problem of one shot pose estimation of articulated objects from an RGB-D image. In particular, we consider object instances with the topology of a kinematic chain, i.e. assemblies of rigid parts connected by prismatic or revolute joints. This object type occurs often in daily live, for instance in the form of furniture or electronic devices. Instead of treating each object part separately we are using the relationship between parts of the kinematic chain and propose a new minimal pose sampling approach. This enables us to create a pose hypothesis for a kinematic chain consisting of K parts by sampling K 3D-3D point correspondences. To asses the quality of our method, we gathered a large dataset containing four objects and 7000+ annotated RGB-D frames1 . On this dataset we achieve considerably better results than a modified state-of-the-art pose estimation system for rigid objects. (Frank Michel, Alexander Krull, Eric Brachmann, Michael Ying Yang, Stefan Gumhold, Carsten Rother)

Real-Time Trajectory Generation for Quadrocopters
Zachary Nehrenberg (UW CSE) 01/29/2016

Abstract: This paper presents a trajectory generation algorithm that efficiently computes high-performance flight trajectories that are capable of moving a quadrocopter from a large class of initial states to a given target point that will be reached at rest. The approach consists of planning separate trajectories in each of the three translational degrees of freedom, and ensuring feasibility by deriving decoupled constraints for each degree of freedom through approximations that preserve feasibility. The presented algorithm can compute a feasible trajectory within tens of microseconds on a laptop computer; remaining computation time can be used to iteratively improve the trajectory. By replanning the trajectory at a high rate, the trajectory generator can be used as an implicit feedback law similar to model predictive control. The solutions generated by the algorithm are analyzed by comparing them with time-optimal motions, and experimental results validate the approach. (Markus Hehn, and Raffaello D’Andrea)

Towards Learning Hierarchical Skills for Multi-Phase Manipulation Tasks
Patrick Lancaster (UW CSE) 01/29/2016

Abstract: Most manipulation tasks can be decomposed into a sequence of phases, where the robot’s actions have different effects in each phase. The robot can perform actions to transition between phases and, thus, alter the effects of its actions, e.g. grasp an object in order to then lift it. The robot can thus reach a phase that affords the desired manipulation. In this paper, we present an approach for exploiting the phase structure of tasks in order to learn manipulation skills more efficiently. Starting with human demonstrations, the robot learns a probabilistic model of the phases and the phase transitions. The robot then employs model-based reinforcement learning to create a library of motor primitives for transitioning between phases. The learned motor primitives generalize to new situations and tasks. Given this library, the robot uses a value function approach to learn a high-level policy for sequencing the motor primitives. The proposed method was successfully evaluated on a real robot performing a bimanual grasping task. (Oliver Kroemer, Christian Daniel, Gerhard Neumann, Herke van Hoof, and Jan Peters)

Mastering the game of Go with deep neural networks and tree search
Aaron Walsman (UW CSE) 02/05/2016

Abstract: The game of Go has long been viewed as the most challenging of classic games for artificial intelligence owing to its enormous search space and the difficulty of evaluating board positions and moves. Here we introduce a new approach to computer Go that uses ‘value networks’ to evaluate board positions and ‘policy networks’ to select moves. These deep neural networks are trained by a novel combination of supervised learning from human expert games, and reinforcement learning from games of self-play. Without any lookahead search, the neural networks play Go at the level of stateof-the-art Monte Carlo tree search programs that simulate thousands of random games of self-play. We also introduce a new search algorithm that combines Monte Carlo simulation with value and policy networks. Using this search algorithm, our program AlphaGo achieved a 99.8% winning rate against other Go programs, and defeated the human European Go champion by 5 games to 0. This is the first time that a computer program has defeated a human professional player in the full-sized game of Go, a feat previously thought to be at least a decade away. (David Silver, Aja Huang et al.)

End-to-End Training of Deep Visuomotor Policies
Harley Montgomery (UW CSE) 02/12/2016

Abstract: Policy search methods can allow robots to learn control policies for a wide range of tasks, but practical applications of policy search often require hand-engineered components for perception, state estimation, and low-level control. In this paper, we aim to answer the following question: does training the perception and control systems jointly end-to-end provide better performance than training each component separately? To this end, we develop a method that can be used to learn policies that map raw image observations directly to torques at the robot's motors. The policies are represented by deep convolutional neural networks (CNNs) with 92,000 parameters, and are trained using a partially observed guided policy search method, which transforms policy search into supervised learning, with supervision provided by a simple trajectory-centric reinforcement learning method. We evaluate our method on a range of real-world manipulation tasks that require close coordination between vision and control, such as screwing a cap onto a bottle, and present simulated comparisons to a range of prior policy search methods. (Sergey Levine, Chelsea Finn, Trevor Darrell, Pieter Abbeel)

Deep Neural Decision Forests
Daniel Gordon (UW CSE) 02/19/2016

Abstract: We present Deep Neural Decision Forests – a novel approach that unifies classification trees with the representation learning functionality known from deep convolutional networks, by training them in an end-to-end manner. To combine these two worlds, we introduce a stochastic and differentiable decision tree model, which steers the representation learning usually conducted in the initial layers of a (deep) convolutional network. Our model differs from conventional deep networks because a decision forest provides the final predictions and it differs from conventional decision forests since we propose a principled, joint and global optimization of split and leaf node parameters. We show experimental results on benchmark machine learning datasets like MNIST and ImageNet and find onpar or superior results when compared to state-of-the-art deep models. Most remarkably, we obtain Top5-Errors of only 7.84%/6.38% on ImageNet validation data when integrating our forests in a single-crop, single/seven model GoogLeNet architecture, respectively. Thus, even without any form of training data set augmentation we are improving on the 6.67% error obtained by the best GoogLeNet architecture (7 models, 144 crops). (Peter Kontschieder, Madalina Fiterau, Antonio Criminisi, Samuel Rota Bulo)

Place Recognition with ConvNet Landmarks: Viewpoint-Robust, Condition-Robust, Training-Free
Justin Huang (UW CSE) 02/26/2016

Abstract: Place recognition has long been an incompletely solved problem in that all approaches involve significant compromises. Current methods address many but never all of the critical challenges of place recognition – viewpoint-invariance, condition-invariance and minimizing training requirements. Here we present an approach that adapts state-of-the-art object proposal techniques to identify potential landmarks within an image for place recognition. We use the astonishing power of convolutional neural network features to identify matching landmark proposals between images to perform place recognition over extreme appearance and viewpoint variations. Our system does not require any form of training, all components are generic enough to be used off-the-shelf. We present a range of challenging experiments in varied viewpoint and environmental conditions. We demonstrate superior performance to current state-of-theart techniques. Furthermore, by building on existing and widely used recognition frameworks, this approach provides a highly compatible place recognition system with the potential for easy integration of other techniques such as object detection and semantic scene interpretation. (Niko Sunderhauf, Sareh Shirazi, Adam Jacobson, Feras Dayoub, Edward Pepperell, Ben Upcroft, and Michael Milford)

DeepMPC: Learning Deep Latent Features for Model Predictive Control
James Youngquist (UW CSE) 03/04/2016

Abstract: Designing controllers for tasks with complex nonlinear dynamics is extremely challenging, time-consuming, and in many cases, infeasible. This difficulty is exacerbated in tasks such as robotic food-cutting, in which dynamics might vary both with environmental properties, such as material and tool class, and with time while acting. In this work, we present DeepMPC, an online real-time model-predictive control approach designed to handle such difficult tasks. Rather than hand-design a dynamics model for the task, our approach uses a novel deep architecture and learning algorithm, learning controllers for complex tasks directly from data. We validate our method in experiments on a large-scale dataset of 1488 material cuts for 20 diverse classes, and in 450 real-world robotic experiments, demonstrating significant improvement over several other approaches. (Ian Lenz, Ross Knepper, and Ashutosh Saxena)

Model-based Reinforcement Learning with Parametrized Physical Models and Optimism-Driven Exploration
Daniel Butler (UW CSE) 03/11/2016

Abstract: In this paper, we present a robotic model-based reinforcement learning method that combines ideas from model identification and model predictive control. We use a featurebased representation of the dynamics that allows the dynamics model to be fitted with a simple least squares procedure, and the features are identified from a high-level specification of the robot’s morphology, consisting of the number and connectivity structure of its links. Model predictive control is then used to choose the actions under an optimistic model of the dynamics, which produces an efficient and goal-directed exploration strategy. We present real time experimental results on standard benchmark problems involving the pendulum, cartpole, and double pendulum systems. Experiments indicate that our method is able to learn a range of benchmark tasks substantially faster than the previous best methods. To evaluate our approach on a realistic robotic control task, we also demonstrate real time control of a simulated 7 degree of freedom arm. (Chris Xie, Sachin Patil, Teodor Moldovan, Sergey Levine, Pieter Abbeel)