ΑΙhub.org
 

#IJCAI2023 distinguished paper – Safe reinforcement learning via probabilistic logic shields


by
11 October 2023



share this:

DALL-E created image of a woman getting into a carImage created by author using DALL.E. Prompt: “A woman getting into a self driving car. Cartoon style”

Are you excited about self-driving cars? Would you trust autonomous driving technology if you were invited to step into a self-driving vehicle? Most importantly, how do you know whether you would be safe during the journey?

Safety is difficult to measure, unlike accuracy, which we are more familiar with. It often feels awkward to describe safety using accuracy-related terms, such as “99 out of 100 journeys are safe” or “99.5% of pedestrians on the roads are correctly identified.“ Hearing this immediately makes us think – what happened on that one trip? Did somebody get injured? What happened to the 0.5% of pedestrians that did not get identified? How confident is the vehicle when it identifies a pedestrian? Does the behavior of the vehicle change when it is only 50% sure that there is a pedestrian compared to when it is 90% sure? The vehicle should brake harder for a 90% certainty than for a 10% certainty, right?

Also, how does the model find a trade-off between safety and the other criteria? Driving at 5 km/hr is safe most of the time, but the passenger most likely wants to reach their destination as soon as possible.

These questions are at the core of safety in AI. Understanding how a machine learning model learns and reasons is crucial to making technologies more trustworthy.

Our paper provides a framework to represent, quantify, and evaluate safety. We define safety using a logic-based approach rather than a numerical one, thus enabling efficient training of safe-by-construction deep reinforcement learning policies. Our approach is a variant of a “shield” but has several advantages compared to a traditional one, including:

  • Our approach better integrates with continuous, end-to-end deep RL methods.
  • Our approach efficiently utilizes the inherent noise and uncertainty in the unstructured input (e.g. images).
  • Our approach can be seamlessly applied to any policy gradient algorithm.
  • Our approach finds a better trade-off between the traditional RL objective and the safety objective.

workflow schematic
Shielding is a popular Safe Reinforcement Learning (Safe RL) technique that aims to find an optimal policy whilst making sure that the agent stays safe. To do so, it relies on a shield, a logical component that monitors the agent’s actions and rejects those that violate the given safety constraint. Before the agent performs an action, it consults the shield whether the action it wants to perform is safe or not.

We will now introduce a running example, illustrate our approach, and then compare it to the more traditional methods.

Running example

DALL-E generated view of cars in the streetImage created by author using DALL.E. Prompt: “Car driving in a city through an intersection. There is a car in front and another on the right. A wide view from the front window.”

Consider a self-driving agent in the figure above that encounters a red light and another vehicle to its right. The agent is equipped with a standard neural policy that takes the image as input and produces a policy, i.e. a probabilistic distribution over five predefined actions {do-nothing, accelerate, brake, turn-left, turn-right}. An example policy can be:

π(do-nothing)=0.1, π(accelerate)=0.5, π(brake)=0.1, π(turn-left)=0.1, π(turn-right)=0.2

The shield has a set of sensors to detect from the image whether there is an obstacle (i.e., the red light and the other car) in front, on the left or on the right of the agent.

P(obs_front)=0.8, P(obs_left)=0.2, P(obs_right)=0.5

The shield has safety-related knowledge represented by a set of probabilistic if-else statements. The first rule below states “if there is an obstacle in the front, and the agent accelerates, then a crash will occur with a probability of 0.9.”

0.9::crash :- obs_front, accelerate.
0.4::crash :- obs_left, turn-left.
0.4::crash :- obs_right, turn-right.

The shield defines safety as the probability of not having a crash, i.e. P(safe) = 1-P(crash).

Now, our goal is to modify the given policy π so that it becomes safer.

Our Approach

Given the above information, our approach consists of three steps.

  1. Action safety P(safe|a): how safe will the agent be if it executes an action a?
  2. Policy safety Pπ(safe): how safe will the agent be if it randomly selects an action using a policy π
  3. By combining the two components above, we can then make the policy π safer. The improved, safer policy is denoted by π+ and π+ is guaranteed to be safer than π. This is key to improving the safety of the entire learning process.

The following table illustrates the three steps. The last column shows the improved policy that the agent will use to sample an action to execute in the environment. We will now explain how these values are calculated.
table showing different policy calculations

Action safety

The first step is to quantify how safe it is to take an action. This can be easily computed using the if-else statement. For example, 0.9::crash :- obs_front, accelerate and P(obs_front)=0.8 together infer that the probability of having a crash if the agent decides to accelerate is P(crash|accelerate)=P(obs_front)*0.9 = 0.72. Then, the probability of staying safe if the agent accelerates is simply P(safe|accelerate) = 1- P(crash|accelerate)= 0.28.

Policy safety

The second step is to compute how safe it is to follow a given policy. This is defined as the average of each action safety weighted by their probability, denoted by Pπ(safe). In our running example, Pπ(safe) = 1*0.1+0.28*0.5+1*0.1+0.92*0.1+0.8*0.2 = 0.592.

Making the policy safer

Finally, we now would like to increase the probability of selecting safe actions and decrease the probability of selecting unsafe ones. To do this, we compute a coefficient P(safe|a)/Pπ(safe) indicating how safe an action a is compared to the other actions. If the action a is relatively safe, the coefficient should be larger than one. Otherwise, it is smaller than one. For example, P(safe|accelerate)/Pπ(safe)=0.28/0.592=0.47, indicating that accelerate is relatively unsafe.

Then, we can multiply the coefficient with the original π(a) to obtain a safer policy, i.e. π+(a). In our running example, π+(accelerate)=0.24, which is lower than the original value 0.5. As a result of this mechanism, the new policy is guaranteed to be safer than the original policy. By using the new policy instead of the old policy, the learning process will be safer.

Comparison with other approaches

We evaluate our approach using Atari games where the agent must select one of the several discrete actions (e.g. accelerate, turn-left, etc) to finish an episodic task such as driving along a road as fast as possible. The agent also has a safety objective such as not driving away from the road.

We compare our approach to two baselines and analyze their behavior.

  • Standard deep RL is the baseline that does not consider the safety objective. This agent generally takes much risk, e.g. making hard turns to reach the destination sooner.
  • Traditional Shields are similar to our approach but assume that any action is either safe or unsafe, i.e. P(safe|a)=0 or 1. This agent is safer than the standard RL agent but may not be more rewarding as it takes no risks.
  • Our approach finds the best trade-off between the reward and the safety objectives. This agent is the safest and most rewarding among all agents.

We have introduced a novel class of shields that enable efficient training of safe-by-construction neural policies. It is a probabilistic generalization of traditional shields. Future work will be dedicated to extending the current approach to be compatible with a larger class of RL algorithms, including those that employ continuous policies.


Wen-Chi Yang, and colleagues Giuseppe Marra, Gavin Rens and Luc De Raedt, won a IJCAI2023 distinguished paper award for their work: Safe Reinforcement Learning via Probabilistic Logic Shields. You can read the paper in full here.



tags:


Wen-Chi Yang is an AI researcher specializing in mathematical guarantees for AI safety
Wen-Chi Yang is an AI researcher specializing in mathematical guarantees for AI safety

            AUAI is supported by:



Subscribe to AIhub newsletter on substack



Related posts :

AI is making journalistic language more repetitive and predictable – and it’s a problem for all of us

  17 Jun 2026
What happens to language when a growing amount of text published in the press, online and on social media is written by machines?
monthly digest

AIhub monthly digest: June 2026 – biodiversity, resource allocation, and color metaphors

  16 Jun 2026
Welcome to our monthly digest, where you can catch up with AI research, events and news from the month past.

AAAI presidential panel – AI agents

  15 Jun 2026
Experts discuss AI agents, one of the topics covered in the AAAI Future of AI Research report.

Interview with AAAI Fellow Tanya Berger-Wolf: AI for ecology, biodiversity, and conservation

  11 Jun 2026
Find out about Tanya work on a foundation model for biology and the insights that this can provide.

Statistical or embodied? Comparing people and LLMs in their processing of color metaphors: an interview with Douglas Guilbeault

  09 Jun 2026
We learn what implications color metaphors and synaesthesia have for human and AI cognition.

The Good Robot podcast: the battle over data centres with Tara Merk

  08 Jun 2026
Eleanor Drage speaks with Tara Merk about how community-owned data centers could transform digital ownership and challenge the dominance of Big Tech.

Congratulations to the #AAMAS2026 best paper award winners

  05 Jun 2026
Find out who won in the categories of best paper, best student paper, and best blue sky paper.

Interview with AAAI Fellow Sanmay Das: multiagent systems

  04 Jun 2026
We find out more about multi-agent research for the allocation of scarce societal resources.



AUAI is supported by:







Subscribe to AIhub newsletter on substack




 















©2026.05 - Association for the Understanding of Artificial Intelligence