ΑΙhub.org
 

#AAAI2023 workshops round-up 3: Reinforcement learning ready for production


by
13 April 2023



share this:
AAAI banner, with Washington DC view and AAAI23 text

As part of the 37th AAAI Conference on Artificial Intelligence (AAAI2023), 32 different workshops were held, covering a wide range of different AI topics. In the third and final post in our series of workshop round-ups we hear from the organisers of the workshop on reinforcement learning for real-world applications, who tell us their key takeaways from their event. This workshop focused on understanding reinforcement learning trends and algorithmic developments that bridge the gap between theoretical reinforcement learning and production environments.


AAAI Reinforcement Learning Ready for Production

Organisers: Zheqing (Bill) Zhu, Yuandong Tian, Timothy Mann, Haque Ishfaq, Zhiwei (Tony) Qin, Doina Precup, Shie Mannor.

A call for computational efficiency

Over the past few years, researchers have developed various methods for reinforcement learning (RL) to improve decision-making quality. These methods include model-based learning, advanced exploration designs, and techniques for dealing with epistemic and aleatoric uncertainty, among others. However, some of these methods fail to address a crucial bottleneck in real-world environments: the limits of computation and response time.

In certain scenarios, such as social media recommendations or self-driving cars, the time allotted for making a decision is often very short, sometimes less than half a second and sometimes responses have to be real-time. Therefore, complex and computationally expensive methods, such as full neural network gradient descent, matrix inversion, or forward-looking model-based simulations, are not feasible for production-level environments.

Given these constraints, RL methods must be able to make intelligent decisions online without relying on computationally expensive operations. Addressing these challenges is critical for developing RL methods that can operate effectively in real-world applications.

Sample efficiency and generalization

To solve a wide range of tasks with limited interactions, an intelligent RL agent must make sequential decisions with limited feedback. However, current state-of-the-art RL algorithms require millions of data points to train and do not generalize well across tasks. Although supervised learning is even harder to generalize for sequential decision tasks, it is valid to be concerned that RL is still insufficient.

One way to improve sample efficiency for online RL agents is by using smarter exploration algorithms that seek informative feedback. Despite practitioners’ fear of exploration due to uncertainty and potential metric losses, relying solely on supervised learning and greedy algorithms can lead to the “echo chamber” phenomenon, where the agent fails to hear the true story from its environment.

Another way to enable RL agents to solve a wide variety of tasks with less data is through generalized value functions (GVF) and auxiliary tasks. By using GVF and auxiliary tasks to gain on-policy understanding of the environment through multiple lenses, the agent can grasp a multi-angle representation of the environment and generalize more quickly to different tasks with fewer interactions.

Counterfactual nature of reinforcement learning

Practitioners familiar with precision-recall metrics from supervised learning models are often apprehensive about deploying RL algorithms because of their nature of generating counterfactual trajectories. The fear stems from the difficulty of imagining the parallel universe an RL agent creates when deployed in production.

Conservative learning in RL agents is key to alleviating concerns about their deployment. Instead of aggressively optimizing the expectation of cumulative return, it is paramount to pay attention to the variance of the learning target to build confidence in a freshly trained RL model. This principle aligns well with the direction of safe RL and calls for a rigorous study of the trade-off between learning and risk aversion.

Off-policy evaluation (OPE) is a field that researchers study to address the lack of understanding in RL agent behavior after deployment in the environment. While the development of doubly robust and problem-specific OPE tools in recent years brings hope for estimating agent performance, such methods are still quite noisy to provide useful signals in highly stochastic environments.

Nonstationarity

One aspect of RL productionization that is often overlooked by the research community is the nonstationarity of production environments. Popular topics in recommender systems, seasonality of commodity prices, economic cycles, and other real-world phenomena can be considered nonstationary behavior from an RL agent’s perspective, given the limited history it can consider and the jumping behavior of the environment. Continual learning and exploration in the face of nonstationarity are potential directions to address these concerns, but as emerging fields, they require extensive study to mature and become useful for production environments.




tags: ,


AIhub is dedicated to free high-quality information about AI.
AIhub is dedicated to free high-quality information about AI.




            AIhub is supported by:


Related posts :



monthly digest

AIhub monthly digest: March 2025 – human-allied AI, differential privacy, and social media microtargeting

  28 Mar 2025
Welcome to our monthly digest, where you can catch up with AI research, events and news from the month past.

AI ring tracks spelled words in American Sign Language

  27 Mar 2025
In its current form, SpellRing could be used to enter text into computers or smartphones via fingerspelling.

How AI images are ‘flattening’ Indigenous cultures – creating a new form of tech colonialism

  26 Mar 2025
AI-generated stock images that claim to depict “Indigenous Australians”, don’t resemble Aboriginal and Torres Strait Islander peoples.

Interview with Lea Demelius: Researching differential privacy

  25 Mar 2025
We hear from doctoral consortium participant Lea Demelius who is investigating the trade-offs and synergies that arise between various requirements for trustworthy AI.

The Machine Ethics podcast: Careful technology with Rachel Coldicutt

This episode, Ben chats to Rachel Coldicutt about AI taxonomy, innovating for everyone not just the few, responsibilities of researchers, and more.

Interview with AAAI Fellow Roberto Navigli: multilingual natural language processing

  21 Mar 2025
Roberto tells us about his career path, some big research projects he’s led, and why it’s important to follow your passion.

Museums have tons of data, and AI could make it more accessible − but standardizing and organizing it across fields won’t be easy

  20 Mar 2025
How can AI models help organize large amounts of data from different collections, and what are the challenges?

Shlomo Zilberstein wins the 2025 ACM/SIGAI Autonomous Agents Research Award

  19 Mar 2025
Congratulations to Shlomo Zilberstein on winning this prestigious award!




AIhub is supported by:






©2024 - Association for the Understanding of Artificial Intelligence


 












©2021 - ROBOTS Association