ΑΙhub.org
 

#IJCAI2023 distinguished paper – Safe reinforcement learning via probabilistic logic shields


by
11 October 2023



share this:

DALL-E created image of a woman getting into a carImage created by author using DALL.E. Prompt: “A woman getting into a self driving car. Cartoon style”

Are you excited about self-driving cars? Would you trust autonomous driving technology if you were invited to step into a self-driving vehicle? Most importantly, how do you know whether you would be safe during the journey?

Safety is difficult to measure, unlike accuracy, which we are more familiar with. It often feels awkward to describe safety using accuracy-related terms, such as “99 out of 100 journeys are safe” or “99.5% of pedestrians on the roads are correctly identified.“ Hearing this immediately makes us think – what happened on that one trip? Did somebody get injured? What happened to the 0.5% of pedestrians that did not get identified? How confident is the vehicle when it identifies a pedestrian? Does the behavior of the vehicle change when it is only 50% sure that there is a pedestrian compared to when it is 90% sure? The vehicle should brake harder for a 90% certainty than for a 10% certainty, right?

Also, how does the model find a trade-off between safety and the other criteria? Driving at 5 km/hr is safe most of the time, but the passenger most likely wants to reach their destination as soon as possible.

These questions are at the core of safety in AI. Understanding how a machine learning model learns and reasons is crucial to making technologies more trustworthy.

Our paper provides a framework to represent, quantify, and evaluate safety. We define safety using a logic-based approach rather than a numerical one, thus enabling efficient training of safe-by-construction deep reinforcement learning policies. Our approach is a variant of a “shield” but has several advantages compared to a traditional one, including:

  • Our approach better integrates with continuous, end-to-end deep RL methods.
  • Our approach efficiently utilizes the inherent noise and uncertainty in the unstructured input (e.g. images).
  • Our approach can be seamlessly applied to any policy gradient algorithm.
  • Our approach finds a better trade-off between the traditional RL objective and the safety objective.

workflow schematic
Shielding is a popular Safe Reinforcement Learning (Safe RL) technique that aims to find an optimal policy whilst making sure that the agent stays safe. To do so, it relies on a shield, a logical component that monitors the agent’s actions and rejects those that violate the given safety constraint. Before the agent performs an action, it consults the shield whether the action it wants to perform is safe or not.

We will now introduce a running example, illustrate our approach, and then compare it to the more traditional methods.

Running example

DALL-E generated view of cars in the streetImage created by author using DALL.E. Prompt: “Car driving in a city through an intersection. There is a car in front and another on the right. A wide view from the front window.”

Consider a self-driving agent in the figure above that encounters a red light and another vehicle to its right. The agent is equipped with a standard neural policy that takes the image as input and produces a policy, i.e. a probabilistic distribution over five predefined actions {do-nothing, accelerate, brake, turn-left, turn-right}. An example policy can be:

π(do-nothing)=0.1, π(accelerate)=0.5, π(brake)=0.1, π(turn-left)=0.1, π(turn-right)=0.2

The shield has a set of sensors to detect from the image whether there is an obstacle (i.e., the red light and the other car) in front, on the left or on the right of the agent.

P(obs_front)=0.8, P(obs_left)=0.2, P(obs_right)=0.5

The shield has safety-related knowledge represented by a set of probabilistic if-else statements. The first rule below states “if there is an obstacle in the front, and the agent accelerates, then a crash will occur with a probability of 0.9.”

0.9::crash :- obs_front, accelerate.
0.4::crash :- obs_left, turn-left.
0.4::crash :- obs_right, turn-right.

The shield defines safety as the probability of not having a crash, i.e. P(safe) = 1-P(crash).

Now, our goal is to modify the given policy π so that it becomes safer.

Our Approach

Given the above information, our approach consists of three steps.

  1. Action safety P(safe|a): how safe will the agent be if it executes an action a?
  2. Policy safety Pπ(safe): how safe will the agent be if it randomly selects an action using a policy π
  3. By combining the two components above, we can then make the policy π safer. The improved, safer policy is denoted by π+ and π+ is guaranteed to be safer than π. This is key to improving the safety of the entire learning process.

The following table illustrates the three steps. The last column shows the improved policy that the agent will use to sample an action to execute in the environment. We will now explain how these values are calculated.
table showing different policy calculations

Action safety

The first step is to quantify how safe it is to take an action. This can be easily computed using the if-else statement. For example, 0.9::crash :- obs_front, accelerate and P(obs_front)=0.8 together infer that the probability of having a crash if the agent decides to accelerate is P(crash|accelerate)=P(obs_front)*0.9 = 0.72. Then, the probability of staying safe if the agent accelerates is simply P(safe|accelerate) = 1- P(crash|accelerate)= 0.28.

Policy safety

The second step is to compute how safe it is to follow a given policy. This is defined as the average of each action safety weighted by their probability, denoted by Pπ(safe). In our running example, Pπ(safe) = 1*0.1+0.28*0.5+1*0.1+0.92*0.1+0.8*0.2 = 0.592.

Making the policy safer

Finally, we now would like to increase the probability of selecting safe actions and decrease the probability of selecting unsafe ones. To do this, we compute a coefficient P(safe|a)/Pπ(safe) indicating how safe an action a is compared to the other actions. If the action a is relatively safe, the coefficient should be larger than one. Otherwise, it is smaller than one. For example, P(safe|accelerate)/Pπ(safe)=0.28/0.592=0.47, indicating that accelerate is relatively unsafe.

Then, we can multiply the coefficient with the original π(a) to obtain a safer policy, i.e. π+(a). In our running example, π+(accelerate)=0.24, which is lower than the original value 0.5. As a result of this mechanism, the new policy is guaranteed to be safer than the original policy. By using the new policy instead of the old policy, the learning process will be safer.

Comparison with other approaches

We evaluate our approach using Atari games where the agent must select one of the several discrete actions (e.g. accelerate, turn-left, etc) to finish an episodic task such as driving along a road as fast as possible. The agent also has a safety objective such as not driving away from the road.

We compare our approach to two baselines and analyze their behavior.

  • Standard deep RL is the baseline that does not consider the safety objective. This agent generally takes much risk, e.g. making hard turns to reach the destination sooner.
  • Traditional Shields are similar to our approach but assume that any action is either safe or unsafe, i.e. P(safe|a)=0 or 1. This agent is safer than the standard RL agent but may not be more rewarding as it takes no risks.
  • Our approach finds the best trade-off between the reward and the safety objectives. This agent is the safest and most rewarding among all agents.

We have introduced a novel class of shields that enable efficient training of safe-by-construction neural policies. It is a probabilistic generalization of traditional shields. Future work will be dedicated to extending the current approach to be compatible with a larger class of RL algorithms, including those that employ continuous policies.


Wen-Chi Yang, and colleagues Giuseppe Marra, Gavin Rens and Luc De Raedt, won a IJCAI2023 distinguished paper award for their work: Safe Reinforcement Learning via Probabilistic Logic Shields. You can read the paper in full here.



tags:


Wen-Chi Yang is an AI researcher specializing in mathematical guarantees for AI safety
Wen-Chi Yang is an AI researcher specializing in mathematical guarantees for AI safety




            AIhub is supported by:



Related posts :



Advanced AI models are not always better than simple ones

  09 Sep 2025
Researchers have developed Systema, a new tool to evaluate how well AI models work when predicting the effects of genetic perturbations.

The Machine Ethics podcast: Autonomy AI with Adir Ben-Yehuda

This episode Adir and Ben chat about AI automation for frontend web development, where human-machine interface could be going, allowing an LLM to optimism itself, job displacement, vibe coding and more.

Using generative AI, researchers design compounds that can kill drug-resistant bacteria

  05 Sep 2025
The team used two different AI approaches to design novel antibiotics, including one that showed promise against MRSA.

#IJCAI2025 distinguished paper: Combining MORL with restraining bolts to learn normative behaviour

and   04 Sep 2025
The authors introduce a framework for guiding reinforcement learning agents to comply with social, legal, and ethical norms.

How the internet and its bots are sabotaging scientific research

  03 Sep 2025
What most people have failed to fully realise is that internet research has brought along risks of data corruption or impersonation.

#ICML2025 outstanding position paper: Interview with Jaeho Kim on addressing the problems with conference reviewing

  02 Sep 2025
Jaeho argues that the AI conference peer review crisis demands author feedback and reviewer rewards.

Forthcoming machine learning and AI seminars: September 2025 edition

  01 Sep 2025
A list of free-to-attend AI-related seminars that are scheduled to take place between 2 September and 31 October 2025.
monthly digest

AIhub monthly digest: August 2025 – causality and generative modelling, responsible multimodal AI, and IJCAI in Montréal and Guangzhou

  29 Aug 2025
Welcome to our monthly digest, where you can catch up with AI research, events and news from the month past.



 

AIhub is supported by:






 












©2025.05 - Association for the Understanding of Artificial Intelligence