ΑΙhub.org
 

Enhancing AI robustness for more secure and reliable systems


by
16 October 2023



share this:

Volkan Cevher in front of whiteboardVolkan Cevher. Photo credit: EPFL/Titouan Veuillet, CC-BY-SA 4.0.

By Michael David Mitchell

By rethinking the way that most artificial intelligence (AI) systems protect against attacks, researchers at EPFL’s School of Engineering have developed a training approach to ensure that machine learning models, particularly deep neural networks, consistently perform as intended, significantly enhancing their reliability. Effectively replacing a long-standing approach to training based on zero-sum game, the new model employs a continuously adaptive attack strategy to create a more intelligent training scenario. The results are applicable across a wide range of activities that depend on artificial intelligence for classification, such as safeguarding video streaming content, self-driving vehicles, and surveillance. The research was a close collaboration between EPFL’s School of Engineering and the University of Pennsylvania (UPenn).

In a digital world where the volume of data surpasses human capacity for full oversight, AI systems wield substantial power in making critical decisions. However, these systems are not immune to subtle yet potent attacks. Someone wishing to trick a system can make minuscule changes to input data and cunningly deceive an AI model. Professor Volkan Cevher and co-authors have undertaken research with the aim of reinforcing security against these attacks.

The research was awarded a Best Paper Award at the 2023 International Conference on Machine Learning’s New Frontiers and Adversarial Machine Learning Workshop for recognizing and correcting an error in a very well-established way to train, improving AI defences against adversarial manipulation. “The new framework shows that one of the core ideas of adversarial training as a two-player, zero-sum game is flawed and must be reworked to enhance robustness in a sustainable fashion,” says Cevher.

All AI systems are open to attack

Consider the context of video streaming platforms like YouTube, which have far too many videos to be scrutinized by the human eye. AI is relied upon to classify videos by analyzing their content to ensure it complies with certain standards. This automatic process is known as “classification.” But the classification system is open to attack and can be cunningly subverted. A malicious hacker, called an “adversary” in game theory, could add background noise to a video containing inappropriate content. While the background noise is completely imperceivable to the human eye, it confuses the AI system enough to circumvent YouTube’s content safety mechanisms. This could lead to children being exposed to violent or sexualized content, even with the parental controls activated.

The YouTube example is only one among many possible similar attacks, and points to a well-known weakness in AI classification systems. This weakness is troubling since these systems are increasingly employed in ways that impact our daily lives, from ensuring the safety of self-driving vehicles to enhancing security in airports and improving medical diagnoses in healthcare settings. To counter these attacks, engineers strengthen the system’s defense by what is called adversarial training. Traditionally, adversarial training is formulated as a two-player zero-sum game. A defender attempts to minimize classification error, while the adversary seeks to maximize it. If one wins, the other loses, hence the zero-sum.

Going beyond the zero-sum game paradigm

However, this theoretical approach faces challenges when transitioning from concept to real-world application. To remedy this, the researchers proposed a solution that literally changes the paradigm: a non-zero-sum game strategy. The team (Alexander Robey, Fabian Latorre, George J. Pappas, Hamed Hassani and Volkan Cevher) developed a new adversarial training formulation and an algorithm that, unlike the traditional zero-sum approach, requires the defender and the adversary to optimize different objectives. This leads to a unique formulation, a continuous bilevel optimization that they’ve named BETA, which stands for BEst TargetedAttack. In technical terms, the defender minimizes an upper bound on classification error, while the adversary maximizes the classification error probability by using an objective for the error margins.

By creating an adversarial model with a stronger adversary that more closely resembles real world situations, the AI classification systems can be more effectively trained. Instead of merely optimizing against a direct threat, defenders adopt a comprehensive strategy, encompassing the worst possible threats. As Cevher emphasizes, “Fabian and his collaborators do not view adversarial machine learning in isolation but contextualize it within the broader tapestry of machine learning theory, reliability, and robustness. This larger vision of training classification allowed them to perceive an initial error and flaw in the formulation for what has been, up until now, the textbook way to train machine learning models. By correcting this error, we’ve improved how we can make AI systems more robust.”

Read the research in full

Adversarial Training Should Be Cast As a Non-Zero-Sum Game, Alexander Robey, Fabian Latorre, George J. Pappas, Hamed Hassani, Volkan Cevher (2023).



tags:


EPFL

            AIhub is supported by:



Subscribe to AIhub newsletter on substack



Related posts :

AI chatbots can effectively sway voters – in either direction

  12 Mar 2026
A short interaction with a chatbot can meaningfully shift a voter’s opinion about a presidential candidate or proposed policy.

Studying the properties of large language models: an interview with Maxime Meyer

  11 Mar 2026
What happens when you increase the prompt length in a LLM? In the latest interview in our AAAI Doctoral Consortium series, we sat down with Maxime, a PhD student in Singapore.

What the Moltbook experiment is teaching us about AI

An experimental social media platform where only AI bots can post reveals surprising lessons about artificial intelligence behaviour and safety.

The malleable mind: context accumulation drives LLM’s belief drift

  09 Mar 2026
LLMs change their "beliefs" over time, depending on the data they are given.

RWDS Big Questions: how do we balance innovation and regulation in the world of AI?

  06 Mar 2026
The panel explores the tensions, trade-offs and practical realities facing policymakers and data scientists alike.

Studying multiplicity: an interview with Prakhar Ganesh

  05 Mar 2026
What is multiplicity, and what implications does it have for fairness, privacy and interpretability in real-world systems?

Top AI ethics and policy issues of 2025 and what to expect in 2026

, and   04 Mar 2026
In the latest issue of AI Matters, a publication of ACM SIGAI, Larry Medsker summarised the year in AI ethics and policy, and looked ahead to 2026.

The greatest risk of AI in higher education isn’t cheating – it’s the erosion of learning itself

  03 Mar 2026
Will AI hollow out the pipeline of students, researchers and faculty that is the basis of today’s universities?



AIhub is supported by:







Subscribe to AIhub newsletter on substack




 















©2026.02 - Association for the Understanding of Artificial Intelligence