about

resources

events

contribute

republishing

☰

ΑΙhub.org

A deep learning model for identifying disease and risk factor biomarkers

by Linköping University

30 October 2023

Smoking leaves traces in the DNA

To test their model, the LiU researchers compared it with existing models. There are already existing models of the effects of smoking on the body, building on the fact that specific epigenetic changes reflect the effect of smoking on the functioning of the lungs. These traces remain in the DNA long after a person has quit smoking, and this type of model can identify whether someone is a current or former, or has never smoked. Other models can, based on epigenetic markers, estimate the chronological age of an individual, or group individuals according to whether they have a disease or are healthy.

The LiU researchers trained their autoencoder and then used the result to answer three different queries: age determination, smoker status and diagnosing the disease systemic lupus erythematosus, SLE.

“Our models not only enable us to classify individuals based on their epigenetic data. We found that our models can identify previously known epigenetic markers used in other models, but also new markers associated with the condition we’re examining. One example of this is that our model for smoking identifies markers associated with respiratory diseases, such as lung cancer, and DNA damage,” says David Martínez, PhD student at Linköping University.

David Martínez, PhD student. Photo: Thor Balkhed.

The objective of the autoencoder models is to enable compression of extremely complex biological data into a representation of the most relevant characteristics and patterns in data.

“We didn’t steer the model and had no hypotheses based on existing biological knowledge, but let the data speak for itself. When subsequently looking at what was happening in the autoencoder, we saw that data self-organised in a way similar to how it works in the body,” says Mika Gustafsson, professor of translational bioinformatics at Linköping University, who led the study now published in Briefings in Bioinformatics.

In the next step, the researchers can use the most important characteristics found by the autoencoder to create models able to classify a large amount of environment-related, individual-specific factors where there is not enough training data to train more complex AI models.

Interpretable AI models

Certain types of AI are sometimes likened to a black box that provides answers, but humans cannot see how the AI arrived at the answer. Mika Gustafsson and his colleagues however strive to create interpretable AI models that, so to speak, let the researchers peek under the lid of the “black box” to understand what is going on inside.

“We want to be able to understand what the model shows us about the biology behind disease and other conditions. Then we’ll see not only whether someone is ill or not, but, by interpreting data, we’ll also have a chance to learn why,” says Mika Gustafsson.

Mika Gustafsson, professor. Photo: Thor Balkhed.

This research was funded by, among others, the Swedish Research Council, the Wallenberg AI, Autonomous Systems and Software Program (WASP) and the SciLifeLab & Wallenberg National Program for Data-Driven Life Science (DDLS).

Read the research in full

NCAE: data-driven representations using a deep network-coherent DNA methylation autoencoder identify robust disease and risk factor signatures, David Martínez-Enguita, Sanjiv K. Dwivedi, Rebecka Jörnsten and Mika Gustafsson, Briefings in Bioinformatics, (2023).

Linköping University

AUAI is supported by:

Everything, eco-where, AI at once?

Laura Martínez Agudelo and Better Images Of AI 19 Jun 2026

Laura Martinez Agudelo builds on her research of visual representations of ecology and digitalisation to explore how "AI eco-imagery" is portrayed.

AI is making journalistic language more repetitive and predictable – and it’s a problem for all of us

The Conversation 17 Jun 2026

What happens to language when a growing amount of text published in the press, online and on social media is written by machines?

monthly digest

Statistical or embodied? Comparing people and LLMs in their processing of color metaphors: an interview with Douglas Guilbeault

Ella Scallan 09 Jun 2026

We learn what implications color metaphors and synaesthesia have for human and AI cognition.

The Good Robot podcast: the battle over data centres with Tara Merk

The Good Robot Podcast 08 Jun 2026

Eleanor Drage speaks with Tara Merk about how community-owned data centers could transform digital ownership and challenge the dominance of Big Tech.

Congratulations to the #AAMAS2026 best paper award winners

Lucy Smith 05 Jun 2026

Find out who won in the categories of best paper, best student paper, and best blue sky paper.

A deep learning model for identifying disease and risk factor biomarkers

Smoking leaves traces in the DNA

Interpretable AI models

Read the research in full

Related posts :

Everything, eco-where, AI at once?

AI is making journalistic language more repetitive and predictable – and it’s a problem for all of us

AIhub monthly digest: June 2026 – biodiversity, resource allocation, and color metaphors

AAAI presidential panel – AI agents

Interview with AAAI Fellow Tanya Berger-Wolf: AI for ecology, biodiversity, and conservation

Statistical or embodied? Comparing people and LLMs in their processing of color metaphors: an interview with Douglas Guilbeault

The Good Robot podcast: the battle over data centres with Tara Merk

Congratulations to the #AAMAS2026 best paper award winners

↑