ΑΙhub.org
 

Learning to explore using active neural SLAM


by
24 June 2020



share this:

By Devendra Singh Chaplot

Advances in machine learning, computer vision and robotics have opened up avenues of building intelligent robots which can navigate in the physical world and perform complex tasks in our homes and offices. Exploration is a key challenge in building intelligent navigation agents. When an autonomous agent is dropped in an unseen environment, it needs to explore as much of the environment as fast as possible.

Efficient exploration in unknown environments

Exploring efficiently in a large environment requires the agent to know 3 things:

  1. where it has been before (i.e. Mapping)
  2. where it is now (i.e. Pose Estimation)
  3. where it needs to go (i.e. Planning)

How do we go about training autonomous exploration agents? One popular approach is using end-to-end deep Reinforcement Learning (RL). However, learning about mapping, pose estimation and planning implicitly in an end-to-end fashion is expensive and sample inefficient. This makes prior methods based on end-to-end RL ineffective at exploration in large environments.

In order to overcome these limitations, we present a modular system called Active Neural SLAM. It builds on ideas from Simultaneous Localization and Mapping literature in classical robotics and leverages learning for better performance and robustness. It uses structured spatial representations, hierarchical policies and analytical planners for learning to explore effectively in large scenes.


Active Neural SLAM

Overview of Active Neural SLAM model.

Active Neural SLAM (ANS) model consists of three components:

  1. Neural SLAM Module predicts the map of the environment and the agent pose based on the current observation and previous prediction:
    • It uses convolutional operations to encode the visual observation followed by deconvolution operations to decode the map.
    • It learns a structured map and pose representations.
    • It is trained with supervised learning with binary cross-entropy loss for map prediction and mean-squared loss for pose prediction.
  2. Global Policy uses the predicted map and agent pose to produce a long-term goal which is converted to a short-term goal using analytical path-planning (like Dijkstra or A*) on the current map.
    • It uses a CNN to produce a long-term goal.
    • It operates at a course time-scale, once every 25 steps.
    • It is trained using reinforcement learning with the increase in the explored area as the reward.
  3. Local Policy outputs navigational actions based on the current observation to reach the short-term goal.
    • It uses a CNN + GRU.
    • It operates at a fine time-scale, predicts a low-level navigation action every step, i.e., move-forward, turn-left or turn-right.
    • It is trained using imitation learning with binary cross-entropy loss.

Results

We use the Habitat simulator with the Gibson and Matterport3D (MP3D) datasets for our experiments. In our exploration task setup, the objective to maximize the coverage in a fixed time budget of 1000 steps. Coverage is the total area in the map known to be traversable. We use two evaluation metrics, the absolute coverage area in m^2 (Cov) and the percentage of area explored in the scene (% Cov).

Exploration performance of the proposed model, Active Neural SLAM (ANS) and baselines.
Exploration performance of the proposed model, Active Neural SLAM (ANS) and baselines.
Baselines adapted from [1] Lample & Chaplot. AAAI-17,  [2] Mirowski et al. ICLR-17,  [3] Chen el al. ICLR-19.

We use 4 baselines based on end-to-end RL. All models are trained on Gibson training scenes and tested on (1) Gibson val set, and (2) MP3D test set (domain generalization). ANS outperforms baselines by a large margin on both test sets in terms of final coverage (see table above), and in terms of exploration efficiency (see plot below).

Plot showing the % Coverage as the episode progresses in large and small scenes.

What makes Neural SLAM better?

  • Inductive bias using Neural SLAM: Instead of letting the model figure out what is useful in the RGB observation for exploration, we tell it explicitly what to predict (i.e. map and pose). This leads to better generalization and sample efficiency.
  • Reducing exploration search-space: The Global Policy operates at a course time-scale, picking a long-term goal every 25 steps. This reduces the time horizon for exploration, what needed 100 low-level navigation actions to be explored, can now be explored in 4 high-level actions.
  • Path-Planning comes for free: End-to-end RL needs to learn path-planning implicitly, which is super data-intensive to learn. The Active Neural SLAM model does not need to learn planning. Because we use an explicit map representation, we can simply use classical planning algorithms.

PointGoal Task Transfer

In the PointGoal task, the objective is to navigate to a goal location whose relative coordinates are given as input. The Active Neural SLAM model can be directly transferred to the PointGoal task without any additional training by just changing the Global policy to always output the PointGoal coordinates as the long-term goal. The ANS model trained for Exploration, when transferred to the PointGoal task can outperform all the baselines trained for the PointGoal Task by a large margin. It was the winner of the CVPR 2019 Habitat PointGoal Navigation Challenge.

PointGoal Task
Top 5 entries on CVPR 2019 Habitat PointGoal Navigation Challenge Leaderboard

Real-world Transfer

Our goal is to get these models to work not just in simulation but in the real-world. It is difficult to transfer navigation models to the real-world due visual and physical domain gap. While simulation environments based on real-world reconstructions close the visual domain gap, the perfect agent motion and pose sensor in the simulation were unrealistic.

In order to bridge this physical domain gap, we collect motion and sensor data in the real-world to create realistic noise models. We then train our exploration policy with noisy motion and pose sensors in the simulation using these noise models. Furthermore, due to the modularity, the pose estimation and global policy work directly on the map space, which is domain invariant. This allowed us to successfully transfer the trained Active Neural SLAM policy on the Locobot hardware platform using the PyRobot API.

Exploration using Active Neural SLAM in the real-world. The observations seen by the agent are shown on the left and the predicted map and pose are shown on the right. The obstacles are shown in green, explored are is shown in light blue, the long-term goal is shown in dark blue and agent pose and trajectory are shown in red. The Active Neural SLAM model is able to explore the whole apartment efficiently.

Discussion and Next Steps

We presented a modular navigation system which leverages the strengths of both classical and learning-based methods. The use of learning provides flexibility with respect to input modalities (in the SLAM module), leverages structural regularities of the world (in global policies), and provides robustness to errors in state estimation (in local policies). It leads to significantly higher performance and up to 75 times better sample efficiency as compared to end-to-end reinforcement learning. Domain invariance of pose estimation & global policy, and realistic motion & sensor noise models allowed us to successfully transfer the model to the real-world. The Active Neural SLAM model can be further improved by incorporating explicit semantics and adding relocalization capabilities.

Incorporating Semantics: The current model only builds an obstacle map, without any explicit semantics. Incorporating semantics in the map and learning task-specific exploration are some challenges to tackle tasks such as Image Goal or Object Goal Navigation.

Relocalization: The current model can be coupled with prior relocalization techniques to add the ability to relocalize in a previously constructed map for more efficient navigation in known environments. Relocalization can also help in better pose estimation by mitigating pose drift.

Interested in more details?

Check out the links to the paper, complete codebase with pre-trained models, talk, slides and project webpage below.

Paper

Code

Talk

Slides

Webpage

This blog post was based on the following paper :
Devendra Singh Chaplot, Dhiraj Gandhi, Saurabh Gupta, Abhinav Gupta, and Ruslan Salakhutdinov. Learning To Explore Using Active Neural SLAM. In International Conference on Learning Representations (ICLR), 2020.

This article was initially published on the ML@CMU blog and appears here with the authors’ permission.




ML@CMU

            AUAI is supported by:



Subscribe to AIhub newsletter on substack



Related posts :

Forthcoming machine learning and AI seminars: May 2026 edition

  05 May 2026
A list of free-to-attend AI-related seminars that are scheduled to take place between 5 May and 30 June 2026.

AI for Science – from cosmology to chemistry

  01 May 2026
How AI is transforming science, from a day conference at the Royal Society
monthly digest

AIhub monthly digest: April 2026 – machine learning for particle physics, AI Index Report, and table tennis

  30 Apr 2026
Welcome to our monthly digest, where you can catch up with AI research, events and news from the month past.

The Machine Ethics podcast: organoid computing with Dr Ewelina Kurtys

In this episode, Ben chats to Ewelina about the uses of organoids and energy saving computing, differences between biological neurons and digital neural networks, and much more.

#AAAI2026 invited talk: Yolanda Gil on improving workflows with AI

  28 Apr 2026
Former AAAI president on using AI to help communities of scientists better streamline their research.

Maryna Viazovska’s proofs of sphere packing formalized with AI

  27 Apr 2026
Formalization achieved through a collaboration between mathematicians and artificial intelligence tools.

Interview with Deepika Vemuri: interpretability and concept-based learning

  24 Apr 2026
Find out more about Deepika's research bridging the gap between data-driven models and symbolic learning.

As a ‘book scientist’ I work with microscopes, imaging technologies and AI to preserve ancient texts

  23 Apr 2026
Using an array of technologies to recover, understand and preserve many valuable ancient texts.



AUAI is supported by:







Subscribe to AIhub newsletter on substack




 















©2026.02 - Association for the Understanding of Artificial Intelligence