ΑΙhub.org
 

Advanced AI models are not always better than simple ones


by
09 September 2025



share this:

There are 6 different arrangements of lines with circles. In each image, there are 4/5 lines in different colours: yellow, blue, pink, and turquoise. The middle circle is blue, and the other lines stick out at different angles surrounding the circle.  Elise Racine / Toy Models I / Licenced by CC-BY 4.0

By Tanya Petersen

Understanding genetic perturbations, when scientists intentionally alter genes to see how this affects cells, is key to understanding what our genes do and how they are controlled. This knowledge has important applications in cell engineering and in developing new treatments.

Today, scientists can test many different genetic perturbations in the lab. But there are so many possible combinations that it is impossible to test them all.

AI and machine learning have created the opportunity to use information from large biological datasets to predict what will happen when a gene is changed — even if that change has never been tested in the laboratory. But how well do these models really work?

Evaluating different prediction models

To assess this, researchers in EPFL’s Machine Learning for Biomedicine Laboratory (MLBio), affiliated with both the School of Computer and Communication Sciences and the School of Life Sciences, in collaboration with international colleagues, tested the best AI models. They used data from ten different experiments and compared them to simple statistical approaches.

In a study recently published in Nature Biotechnology the team found something surprising. Simple approaches did just as well as, if not better than, advanced AI models on many datasets.

“The observation that simple approaches perform as well as advanced AI models made us wonder: are the advanced models actually understanding what gene changes do? Are the standard metrics suitable for evaluating these models?” said Assistant Professor Maria Brbic, head of the MLBio Lab.

Why did the simple methods do so well?

Advanced models may look better than they are. This is because of systematic differences between treated and untreated cells. In these cases, the models may not be learning the true effects of the genetic changes. Instead, they may just notice patterns caused by the design of the experiment or effects that happen for almost all genetic changes.

The researchers also found that common ways of checking model performance can be misleading. They often fail to account for these systematic differences.

“To deal with this, we created a tool called Systema. It reduces the influence of systematic biases and focuses on the unique effects of each genetic perturbation. Systema also makes it easier to understand what genetic perturbations actually do,” explained Ramon Viñas Torné, a postdoctoral researcher in the MLBio Lab and the first author of the paper.

Prediction is harder than standard metrics suggest

With Systema, the researchers found that it’s still very hard for AI models to predict the effects of new genetic changes. Some models could make correct guesses when the genes were part of the same biological process, but overall the challenge remains.

Systema helps tell the difference between models that are just picking up biases and those that truly understand how genetic modifications affect cells.

The researchers suggest that AI models should be evaluated based on their biological value. This means looking at how well predictions explain cellular traits.

“Looking ahead, having bigger and more diverse experiments will help make these predictions better. Also, new technologies that look at cells in more detail, like their shape or location, could help us to understand how gene changes affect cells and tissues better,” concluded Brbic.

References

Learn more about Systema.

Systema: a framework for evaluating genetic perturbation response prediction beyond systematic variation, Viñas Torné, R., Wiatrak, M., Piran, Z. et al., Nat Biotechnol (2025).




EPFL

            AUAI is supported by:



Subscribe to AIhub newsletter on substack



Related posts :

A faster way to estimate AI power consumption

  19 May 2026
The “EnergAIzer” method generates reliable results in seconds, enabling data center operators to efficiently allocate resources and reduce wasted energy.

Introducing ARFBench: A time series question-answering benchmark based on real incidents

  18 May 2026
To resolve system failures, engineers must troubleshoot outages quickly.

Does ‘federated unlearning’ in AI improve data privacy, or create a new cybersecurity risk?

  15 May 2026
As the capacity of AI systems increases apace, so do concerns about the privacy of user data.

Reflections from #AIES2025

and   14 May 2026
We reflect on AIES 2025, outlining a discussion session on LLMs for clinical usage and human rights.

Deep learning-powered biochip to detect genetic markers

System can detect extremely small amounts of microRNAs, genetic markers linked to diseases such as heart disease.

Half of AI health answers are wrong even though they sound convincing – new study

  12 May 2026
Imagine you have just been diagnosed with early-stage cancer and, before your next appointment, you type a question into an AI chatbot.

Gradient-based planning for world models at longer horizons

  11 May 2026
What were the problems that motivated this project and what was the approach to address them?

It’s tempting to offload your thinking to AI. Cognitive science shows why that’s a bad idea

  08 May 2026
Increased offloading to new tools has raised the fear that people will become overly reliant on AI.



AUAI is supported by:







Subscribe to AIhub newsletter on substack




 















©2026.02 - Association for the Understanding of Artificial Intelligence