Advanced AI models are not always better than simple ones

by EPFL

09 September 2025

Evaluating different prediction models

To assess this, researchers in EPFL’s Machine Learning for Biomedicine Laboratory (MLBio), affiliated with both the School of Computer and Communication Sciences and the School of Life Sciences, in collaboration with international colleagues, tested the best AI models. They used data from ten different experiments and compared them to simple statistical approaches.

In a study recently published in Nature Biotechnology the team found something surprising. Simple approaches did just as well as, if not better than, advanced AI models on many datasets.

“The observation that simple approaches perform as well as advanced AI models made us wonder: are the advanced models actually understanding what gene changes do? Are the standard metrics suitable for evaluating these models?” said Assistant Professor Maria Brbic, head of the MLBio Lab.

Why did the simple methods do so well?

Advanced models may look better than they are. This is because of systematic differences between treated and untreated cells. In these cases, the models may not be learning the true effects of the genetic changes. Instead, they may just notice patterns caused by the design of the experiment or effects that happen for almost all genetic changes.

The researchers also found that common ways of checking model performance can be misleading. They often fail to account for these systematic differences.

“To deal with this, we created a tool called Systema. It reduces the influence of systematic biases and focuses on the unique effects of each genetic perturbation. Systema also makes it easier to understand what genetic perturbations actually do,” explained Ramon Viñas Torné, a postdoctoral researcher in the MLBio Lab and the first author of the paper.

Prediction is harder than standard metrics suggest

With Systema, the researchers found that it’s still very hard for AI models to predict the effects of new genetic changes. Some models could make correct guesses when the genes were part of the same biological process, but overall the challenge remains.

Systema helps tell the difference between models that are just picking up biases and those that truly understand how genetic modifications affect cells.

The researchers suggest that AI models should be evaluated based on their biological value. This means looking at how well predictions explain cellular traits.

“Looking ahead, having bigger and more diverse experiments will help make these predictions better. Also, new technologies that look at cells in more detail, like their shape or location, could help us to understand how gene changes affect cells and tissues better,” concluded Brbic.

References

Learn more about Systema.

Systema: a framework for evaluating genetic perturbation response prediction beyond systematic variation, Viñas Torné, R., Wiatrak, M., Piran, Z. et al., Nat Biotechnol (2025).

EPFL

AIhub is supported by:

monthly digest

Advanced AI models are not always better than simple ones

Evaluating different prediction models

Why did the simple methods do so well?

Prediction is harder than standard metrics suggest

References

Related posts :

Interview with Zahra Ghorrati: developing frameworks for human activity recognition using wearable sensors

Diffusion beats autoregressive in data-constrained settings

Forthcoming machine learning and AI seminars: October 2025 edition

AIhub monthly digest: September 2025 – conference reviewing, soccer ball detection, and memory traces

Botanical time machines: AI is unlocking a treasure trove of data held in herbarium collections

All creatures, great, small, and artificial

RoboCup Logistics League: an interview with Alexander Ferrein, Till Hofmann and Wataru Uemura

Data centers consume massive amounts of water – companies rarely tell the public exactly how much

↑