ΑΙhub.org
 

Benchmarking quality-diversity algorithms on neuroevolution for reinforcement learning


by and
14 December 2022



share this:

photo of six peopleMembers of the AIRL lab at Imperial College, and authors of the reported work in this blog post. From left to right: Bryan Lim, Dr Antoine Cully (director of the AIRL lab), Manon Flageat, Luca Grillotti, Dr Simón C Smith, and Maxime Allard.

Learning and finding different solutions to the same problem is commonly associated with creativity and adaptation, which are important characteristics of intelligence. In the AIRL lab at Imperial College, we believe in the importance of diversity in learning algorithms. With this focus in mind, we develop learning algorithms known as Quality-Diversity algorithms.

Quality-Diversity (QD) algorithms

QD algorithms [1] are a relatively new and growing family of optimization algorithms. In contrast to conventional optimization algorithms, which aim to find a single high-performing/optimal solution, QD algorithms search for a large diversity of high-performing solutions. For example, instead of just one way to walk, QD algorithms learn a diversity of good gaits to walk, or instead of just one way to grasp a cup, many different ways of grasping a cup.

As QD is a growing field, new promising applications and algorithms are released every month. In our work, we propose to apply QD to deep reinforcement learning tasks. We introduce a set of benchmark tasks and metrics to facilitate the comparison of QD approaches in this setting.

The reinforcement learning (RL) and deep reinforcement learning (deep RL) settings

As QD algorithms are general optimization algorithms, they can be used on any optimization/problem setting, RL is an example of such a setting. In RL [2], the agent gets positive or negative rewards based on the outcome of its actions. These rewards act as reinforcement to consolidate beneficial behaviors. For example, to get a robot to learn how to walk forward without falling, we give it positive rewards each time it manages to walk and negative rewards each time it falls. Deep RL [2] is the problem domain in which neural networks are used by the agent for sequential decision making to solve RL tasks.

Neuroevolution in deep RL setting

Conventionally, gradient descent is used to optimize the parameters of neural networks. This is also the case for the neural networks used in deep RL. On the other hand, neuroevolution [3], as the name suggests, is inspired by evolutionary computation methods, in which the parameters of neural networks are evolved instead via some variation/perturbation operators. QD algorithms are evolutionary algorithms and can be used to perform neuroevolution. Here, we are interested in QD algorithms applied to neuroevolution for deep RL.

A challenge in RL compared to the more common and mature sub-field of supervised learning, is that there is no pre-existing dataset. The data being collected for the agent to learn, is obtained by the agent itself through interaction with the environment. This introduces challenges in exploration and generalization. As QD algorithms aim to find a population of diverse agents, they are a promising approach to address these challenges.

A new framework for QD applied to neuroevolution in deep RL

We propose a new benchmark that aims to formalize several tasks, both new and that have already been used extensively across QD literature, in one common framework. We consider these tasks across a range of different types of robots which consist of simpler systems with low number of degrees of freedom to higher dimensional systems similar to real-world robots (see figure below displaying images of our environments).

different simulations

Among some other desiderata of benchmarks, we also heavily considered the time required to evaluate algorithms using this benchmark. We leverage the recent advances in hardware acceleration and parallel simulators to enable algorithms to be evaluated quickly. We use QDax [4], which is a recent library developed in Jax in which these tasks are implemented using the Brax simulator which has shown to accelerate the evaluations of QD algorithms [4].

Beyond the tasks themselves, we also took a deeper look at the metrics used to analyze the algorithms. As we aim to learn a population of diverse and high-performing agents, the quality and diversity of the population has been quantified using a single scalar value in literature, called the QD-score metric. However, as it is a single value, it fails to capture the full state of the resulting population of agents. For example, a same QD-score can be obtained by (1) a large population of very diverse agents that are not good and (2) a small population of good agents. Hence, we formalize a more general metric called the “archive profile” where the shape of the curve gives us this additional information while still allowing us to recover the QD-score through the area under the curve.

Additionally, noisy problems have been demonstrated to be a challenge for QD algorithms [5]. Noise impacts the ability of the algorithm to quantify the true performance and novelty of solutions. Common deep RL tasks are usually noisy, so do our benchmark tasks. One of our findings is that the inherent stochasticity present in deep reinforcement learning problems further amplifies this issue. In addition, due to the noise challenge, most of the common QD metrics become less meaningful in interpreting the results. We thus also introduce a new evaluation procedure to quantify this impact more meaningfully.

Our most promising finding is that some tasks in the proposed benchmark are still unsolvable with the QD algorithms we have today. This leaves room for improvement for QD algorithms on this set of benchmarks and a good goal for researchers in the community.

What is next?

Setting up this task suite opens a lot of research directions. We now have a fast, replicable, easy-to-set-up benchmark, as well as a set of baselines and metrics. This constitutes a powerful framework to create, test and develop new QD approaches with the aim to tackle common issues such as the one we raised in our work. Using this new framework, our lab would continue focusing on improving QD algorithms and extending the range of their applications!

References

[1] Chatzilygeroudis, K., Cully, A., Vassiliades, V., & Mouret, J. B. (2021). Quality-Diversity Optimization: a novel branch of stochastic optimization. In Black Box Optimization, Machine Learning, and No-Free Lunch Theorems (pp. 109-135). Springer, Cham.

[2] Arulkumaran, K., Deisenroth, M. P., Brundage, M., & Bharath, A. A. (2017). Deep reinforcement learning: A brief survey. IEEE Signal Processing Magazine, 34(6), 26-38.

[3] Lehman, J., & Miikkulainen, R. (2013). Neuroevolution. Scholarpedia, 8(6), 30977.

[4] Lim, B., Allard, M., Grillotti, L., & Cully, A. (2022). Accelerated Quality-Diversity for Robotics through Massive Parallelism. arXiv preprint arXiv:2202.01258.

[5] Flageat, M., & Cully, A. (2020). Fast and stable MAP-Elites in noisy domains using deep grids. arXiv preprint arXiv:2006.14253.

Read the research in full

Benchmarking Quality-Diversity Algorithms on Neuroevolution for Reinforcement Learning, Manon Flageat, Bryan Lim, Luca Grillotti, Maxime Allard, Simón C. Smith, Antoine Cully.




Manon Flageat is a PhD student at Imperial College London.
Manon Flageat is a PhD student at Imperial College London.

Bryan Lim is a PhD student at Imperial College London.
Bryan Lim is a PhD student at Imperial College London.




            AIhub is supported by:


Related posts :



monthly digest

AIhub monthly digest: May 2025 – materials design, object state classification, and real-time monitoring for healthcare data

  30 May 2025
Welcome to our monthly digest, where you can catch up with AI research, events and news from the month past.

Congratulations to the #AAMAS2025 best paper, best demo, and distinguished dissertation award winners

  29 May 2025
Find out who won the awards presented at the International Conference on Autonomous Agents and Multiagent Systems last week.

The Good Robot podcast: Transhumanist fantasies with Alexander Thomas

  28 May 2025
In this episode, Eleanor talks to Alexander Thomas, a filmmaker and academic, about the transhumanist narrative.

Congratulations to the #ICRA2025 best paper award winners

  27 May 2025
The winners and finalists in the different categories have been announced.

#ICRA2025 social media round-up

  23 May 2025
Find out what the participants got up to at the International Conference on Robotics & Automation.

Interview with Gillian Hadfield: Normative infrastructure for AI alignment

  22 May 2025
Kumar Kshitij Patel spoke to Gillian Hadfield about her interdisciplinary research, career trajectory, path into AI alignment, law, and general thoughts on AI systems.

PitcherNet helps researchers throw strikes with AI analysis

  21 May 2025
Baltimore Orioles tasks Waterloo Engineering researchers to develop AI tech that can monitor pitchers using low-resolution video captured by smartphones

Interview with Filippos Gouidis: Object state classification

  20 May 2025
Read the latest interview in our series featuring the AAAI/SIGAI Doctoral Consortium participants.



 

AIhub is supported by:






©2025.05 - Association for the Understanding of Artificial Intelligence


 












©2025.05 - Association for the Understanding of Artificial Intelligence