ΑΙhub.org
 

#ECAI2023 outstanding paper: Interview with Xuan Liu – multi-agent sparse reward tasks


by
19 October 2023



share this:

Exploiting high-value experience to provide guidance.

Xuan Liu, and colleagues Xinning Chen, Yanwen Ba, Shigeng Zhang, Bo Ding, Kenli Li, won an outstanding paper award at the 26th European Conference on Artificial Intelligence (ECAI 2023). In this interview, Xuan tells us more about their work.

What is the topic of the research in your paper?

Our paper Selective Learning for Sample-Efficient Training in Multi-Agent Sparse Reward Tasks focuses on improving sample efficiency for multi-agent cooperative tasks with sparse reward. Learning effective polices in multi-agent environments with sparse reward is a great challenge for agents as the concurrent learning of multiple agents induces the non-stationarity problem and sharply increases joint state space. In this paper, we present an effective multi-agent selective learning framework, which selects only valuable experiences of other agents, instead of the whole, huge trajectory space, to accelerate the learning process.

Could you tell us about the implications of your research and why it is an interesting area for study?

The sparse reward problem is one of the core problems of reinforcement learning in solving practical tasks. Recently, successful approaches in reinforcement learning have relied on reward functions that give clear feedback to agents at every step. However, for some complex tasks with sparse reward, such as autonomous driving and robotic control, learning an optimal policy becomes extremely difficult for agents due to the lack of feedback signals. Sparse rewards are delayed, which provide feedback to agents only in a few states, leading to poor sample efficiency. In multi-agent tasks, the sparse reward challenge is aggravated by the need for policy coupling and the non-stationarity of environments.

Previous multi-agent reinforcement learning methods have attempted to promote sample efficiency through experience sharing. However, learning from a large collection of shared experiences is inefficient as there are only a few high-value states in sparse reward tasks, which may instead lead to the curse of dimensionality in large-scale multi-agent systems. In our work, we propose a novel method that improves exploration efficiency and balances learning efficiency with stability simultaneously, which greatly promotes the application of multi-agent reinforcement learning in complex sparse reward tasks.

Could you explain your methodology?

The key idea of our method is to selectively reuse valuable experiences to accelerate the learning process. Our method adopts a centralized training and decentralized execution paradigm. First, to improve sample efficiency in sparse reward tasks, we introduce a centralized backtracking model, which consists of an observation generator and an action generator. The backtracking model generates recall traces from high-value samples. Each agent not only speeds up learning by imitating its own recall traces, but also shares the traces with other agents for aiding effective exploration. Second, we introduce a selector to select K agents’ information based on the correlation between agent observations, which effectively mitigates the non-stationarity of the multi-agent environments and enhances the scalability of our approach in scenarios with more agents. Moreover, considering that it is difficult to identify high-value experiences when all agents obtain a shared team reward, we specifically consider the fully cooperative setting with shared team reward and design a retrogression-based selection method to overcome the difficulty of recognizing contributors from the shared reward.

Multi-agent selective learning framework.

What were your main findings?

By evaluating our method in several multi-agent cooperative tasks with sparse reward, we found that our method significantly improves sample efficiency for sparse reward tasks, especially in tasks with large-scale agents. We also found that our method has broad applicability to multi-agent tasks with different reward settings.

What further work are you planning in this area?

Sparse rewards remain an ongoing challenge. In the future, we will explore the issue of sparse rewards in more intricate multi-agent environments. Our attention will be directed towards optimizing the utilization of valuable experiences and identifying the interaction relationship between agents in diverse environments.

About Xuan Liu

Xuan Liu

Xuan Liu (Member, IEEE) is currently a Professor in the College of Computer Science and Electronic Engineering at Hunan University, China. She received a MSc from National University of Defense Technology, and a PhD from Hong Kong Polytechnic University. Her research interests include multi-agent reinforcement learning and its applications, and Internet of Things.

Read the work in full

Selective Learning for Sample-Efficient Training in Multi-Agent Sparse Reward Tasks, Xinning Chen, Xuan Liu, Yanwen Ba, Shigeng Zhang, Bo Ding, Kenli Li.



tags:


Lucy Smith is Senior Managing Editor for AIhub.
Lucy Smith is Senior Managing Editor for AIhub.




            AIhub is supported by:


Related posts :



monthly digest

AIhub monthly digest: May 2025 – materials design, object state classification, and real-time monitoring for healthcare data

  30 May 2025
Welcome to our monthly digest, where you can catch up with AI research, events and news from the month past.

Congratulations to the #AAMAS2025 best paper, best demo, and distinguished dissertation award winners

  29 May 2025
Find out who won the awards presented at the International Conference on Autonomous Agents and Multiagent Systems last week.

The Good Robot podcast: Transhumanist fantasies with Alexander Thomas

  28 May 2025
In this episode, Eleanor talks to Alexander Thomas, a filmmaker and academic, about the transhumanist narrative.

Congratulations to the #ICRA2025 best paper award winners

  27 May 2025
The winners and finalists in the different categories have been announced.

#ICRA2025 social media round-up

  23 May 2025
Find out what the participants got up to at the International Conference on Robotics & Automation.

Interview with Gillian Hadfield: Normative infrastructure for AI alignment

  22 May 2025
Kumar Kshitij Patel spoke to Gillian Hadfield about her interdisciplinary research, career trajectory, path into AI alignment, law, and general thoughts on AI systems.

PitcherNet helps researchers throw strikes with AI analysis

  21 May 2025
Baltimore Orioles tasks Waterloo Engineering researchers to develop AI tech that can monitor pitchers using low-resolution video captured by smartphones

Interview with Filippos Gouidis: Object state classification

  20 May 2025
Read the latest interview in our series featuring the AAAI/SIGAI Doctoral Consortium participants.



 

AIhub is supported by:






©2025.05 - Association for the Understanding of Artificial Intelligence


 












©2025.05 - Association for the Understanding of Artificial Intelligence