Advances in machine learning, computer vision and robotics have opened up avenues of building intelligent robots which can navigate in the physical world and perform complex tasks in our homes and offices. Exploration is a key challenge in building intelligent navigation agents. When an autonomous agent is dropped in an unseen environment, it needs to explore as much of the environment as fast as possible.
Exploring efficiently in a large environment requires the agent to know 3 things:
How do we go about training autonomous exploration agents? One popular approach is using end-to-end deep Reinforcement Learning (RL). However, learning about mapping, pose estimation and planning implicitly in an end-to-end fashion is expensive and sample inefficient. This makes prior methods based on end-to-end RL ineffective at exploration in large environments.
In order to overcome these limitations, we present a modular system called Active Neural SLAM. It builds on ideas from Simultaneous Localization and Mapping literature in classical robotics and leverages learning for better performance and robustness. It uses structured spatial representations, hierarchical policies and analytical planners for learning to explore effectively in large scenes.
Active Neural SLAM (ANS) model consists of three components:
We use the Habitat simulator with the Gibson and Matterport3D (MP3D) datasets for our experiments. In our exploration task setup, the objective to maximize the coverage in a fixed time budget of 1000 steps. Coverage is the total area in the map known to be traversable. We use two evaluation metrics, the absolute coverage area in (Cov) and the percentage of area explored in the scene (% Cov).
We use 4 baselines based on end-to-end RL. All models are trained on Gibson training scenes and tested on (1) Gibson val set, and (2) MP3D test set (domain generalization). ANS outperforms baselines by a large margin on both test sets in terms of final coverage (see table above), and in terms of exploration efficiency (see plot below).
In the PointGoal task, the objective is to navigate to a goal location whose relative coordinates are given as input. The Active Neural SLAM model can be directly transferred to the PointGoal task without any additional training by just changing the Global policy to always output the PointGoal coordinates as the long-term goal. The ANS model trained for Exploration, when transferred to the PointGoal task can outperform all the baselines trained for the PointGoal Task by a large margin. It was the winner of the CVPR 2019 Habitat PointGoal Navigation Challenge.
Our goal is to get these models to work not just in simulation but in the real-world. It is difficult to transfer navigation models to the real-world due visual and physical domain gap. While simulation environments based on real-world reconstructions close the visual domain gap, the perfect agent motion and pose sensor in the simulation were unrealistic.
In order to bridge this physical domain gap, we collect motion and sensor data in the real-world to create realistic noise models. We then train our exploration policy with noisy motion and pose sensors in the simulation using these noise models. Furthermore, due to the modularity, the pose estimation and global policy work directly on the map space, which is domain invariant. This allowed us to successfully transfer the trained Active Neural SLAM policy on the Locobot hardware platform using the PyRobot API.
We presented a modular navigation system which leverages the strengths of both classical and learning-based methods. The use of learning provides flexibility with respect to input modalities (in the SLAM module), leverages structural regularities of the world (in global policies), and provides robustness to errors in state estimation (in local policies). It leads to significantly higher performance and up to 75 times better sample efficiency as compared to end-to-end reinforcement learning. Domain invariance of pose estimation & global policy, and realistic motion & sensor noise models allowed us to successfully transfer the model to the real-world. The Active Neural SLAM model can be further improved by incorporating explicit semantics and adding relocalization capabilities.
Incorporating Semantics: The current model only builds an obstacle map, without any explicit semantics. Incorporating semantics in the map and learning task-specific exploration are some challenges to tackle tasks such as Image Goal or Object Goal Navigation.
Relocalization: The current model can be coupled with prior relocalization techniques to add the ability to relocalize in a previously constructed map for more efficient navigation in known environments. Relocalization can also help in better pose estimation by mitigating pose drift.
Check out the links to the paper, complete codebase with pre-trained models, talk, slides and project webpage below.
This blog post was based on the following paper :
Devendra Singh Chaplot, Dhiraj Gandhi, Saurabh Gupta, Abhinav Gupta, and Ruslan Salakhutdinov. Learning To Explore Using Active Neural SLAM. In International Conference on Learning Representations (ICLR), 2020.
This article was initially published on the ML@CMU blog and appears here with the authors’ permission.