ΑΙhub.org
 

Theoretical remarks on feudal hierarchies and reinforcement learning


by
16 January 2024



share this:
Two agents and a planet

Reinforcement learning is a paradigm through which an agent interacts with its environment by trying out different actions at different states and observing the outcome. Each of these interactions can change the state of the environment, and can also provide rewards to the agent. The goal of the agent is to learn the value of performing each action on each state. By value, we mean the biggest amount of rewards that is possible for the agent to obtain after performing that action in that state. If the agent achieves this goal, it can then act optimally on its environment by choosing, at every state, the action that has the biggest value.

one agent and a planetIn reinforcement learning, an agent interacts with its environment by making actions and observing their outcome, with the goal of selecting actions with the biggest value.

In 1989, Watkins proposed an algorithm to solve the reinforcement learning problem: Q-learning [1]. Not long after, the same Watkins and Dayan showed that the algorithm did indeed converge to the correct solution [2].

But we, humans, don’t just map states to the next best actions. If we want to get from where we are sitting right now to a friend’s house, we don’t just look at our environment, decide to stand up, then observe the environment again, decide to take a step forward, and so on and so forth. Before making each of these small decisions, we need to make bigger decisions. To achieve the goal of reaching our friend’s house, we need to define other goals that may include putting on our shoes, or getting on the bus down the street, and only then make small decisions that lead us to achieve them. We have a way of abstracting our decision making at different levels, in what resembles an hierarchy.

In 1992, Dayan himself and Hinton were thinking about this, and introduced the first idea for hierarchical reinforcement learning: feudal reinforcement learning [3]. In feudal reinforcement learning, the agent sets goals for an instance of itself and takes small steps to achieve them.

two agents and a planet - one agent is joined by dotted lines, the other by full linesIn hierarchical reinforcement learning, the agent sets goals to an instance of itself that interacts directly with the environment looking to achieve the goals set by the agent.

While learning occurs at different levels of the hierarchy at the same time, from the point of view of the agent, the environment is changing: as the instance of the agent becomes more competent in achieving the goals set, their outcomes also change. In our work, we show that, despite this fundamental difference, Q-learning also solves the hierarchical decision making process.

To establish our result, we first show that the dependencies within the hierarchy are smooth. Therefore, while the policy of the instance of the agent changes, the environment that the agent views also changes. Then, we show that Q-learning converges in changing, but converging, environments. Putting our two results together, we conclude that Q-learning converges with probability 1 to the correct solution at all levels of the hierarchy.

Hierarchies in reinforcement learning are not only human-inspired, but they also set the state of the art in complex decision making tasks, such as the game of Minecraft [5]. Furthermore, a well-constructed hierarchy allows for flexibility across different layers, as well as interpretability. With our results, we provided theoretical support for the use, and success, of hierarchies in reinforcement learning and motivated future research.

References

[1] Watkins, Christopher J. C. H. “Learning from delayed rewards”. PhD Thesis. 1989.
[2] Watkins, Christopher J. C. H. “Q-learning”. Machine Learning, 1992. 279-292.
[3] Dayan, Peter, Geoffrey Hinton. “Feudal reinforcement learning”. NeurIPS 1992. 279-292.
[4] Carvalho, Diogo S., Francisco S. Melo, Pedro A. Santos. “Theoretical remarks on feudal hierarchies and reinforcement learning”. ECAI 2023. IOS Press, 2023. 351-356.
[5] Lin, Zichuan, Junyou Li, Jianing Shi, Deheng Ye, Qiang Fu, Wei Yang. “Playing Minecraft with sample-efficient hierarchical reinforcement learning”. IJCAI 2022. 3257-3263.


This work won an outstanding paper award at ECAI 2023.



tags:


Diogo Carvalho is a PhD student at Instituto Superior Técnico of the University of Lisbon, and GAIPS of INESC-ID
Diogo Carvalho is a PhD student at Instituto Superior Técnico of the University of Lisbon, and GAIPS of INESC-ID

            AIhub is supported by:



Subscribe to AIhub newsletter on substack



Related posts :

Forthcoming machine learning and AI seminars: April 2026 edition

  02 Apr 2026
A list of free-to-attend AI-related seminars that are scheduled to take place between 2 April and 31 May 2026.

#AAAI2026 invited talk: machine learning for particle physics

  01 Apr 2026
How is ML used in the search for new particles at CERN?
monthly digest

AIhub monthly digest: March 2026 – time series, multiplicity, and the history of RoboCup

  31 Mar 2026
Welcome to our monthly digest, where you can catch up with AI research, events and news from the month past.

What I’ve learned from 25 years of automated science, and what the future holds: an interview with Ross King

  30 Mar 2026
We launch our new series with a conversation with Ross King - a pioneer in the field of AI-enabled scientific discovery.

A multi-armed robot for assisting with agricultural tasks

and   27 Mar 2026
How can a robot safely manipulate branches to reveal hidden flowers while remaining aware of interaction forces and minimizing damage?

Resource-constrained image generation and visual understanding: an interview with Aniket Roy

  26 Mar 2026
Aniket tells us about his research exploring how modern generative models can be adapted to operate efficiently while maintaining strong performance.

RWDS Big Questions: how do we highlight the role of statistics in AI?

  25 Mar 2026
Next in our series, the panel explores the statistical underpinning of AI.

A history of RoboCup with Manuela Veloso

  24 Mar 2026
Find out how RoboCup got started and how the competition has evolved, from one of the co-founders.



AIhub is supported by:







Subscribe to AIhub newsletter on substack




 















©2026.02 - Association for the Understanding of Artificial Intelligence