Laurène Donati and Virginie Uhlmann © 2022 Alain Herzog
By Cécilia Carron
Deep learning models are becoming increasingly common in bioimage analysis. Yet a lack of standardization and the use of these algorithms by non-experts are potential sources of bias. Scientists from EPFL and the European Bioinformatics Institute (EMBL-EBI) offer practical tips and guidance in a paper recently published in the journal IEEE Signal Processing Magazine.
Scientists are constantly seeking imaging systems that are faster, more powerful and capable of supporting longer observation times. This is especially true in life sciences, where objects of interest are rarely visible to the naked eye. As technological progress allows us to study life on ever smaller scales of time and space, often at less than nanoscale, researchers are also turning to increasingly powerful artificial intelligence programs to sort through and analyze these vast datasets. Deep learning models – a type of machine learning algorithm that uses multi-layer networks to extract insights from raw input – are growing in popularity among life sciences researchers on account of their speed and precision. Yet using these models without fully understanding their architecture and their limitations introduces the risk of bias and error, with potentially major consequences. Scientists from the EPFL Center for Imaging and EMBL-EBI (Cambridge, UK) explore these challenges one by one in a paper published in the journal IEEE Signal Processing Magazine. The team outlines good practices for employing deep learning technologies in life sciences and advocates for closer interdisciplinary collaboration between bioscience researchers and program developers.
An effective deep learning model needs to be able to detect patterns and contrasts, recognize the orientation of objects in images, and much more. In other words, it needs to be a subject-matter expert. It achieves this level of expertise through training by software developers. The model starts by using nonspecific algorithms to extract general features from a dataset, gradually developing more detailed insights with each pass – or layer. This design means that, in order to apply a deep learning system to a specific discipline or area of interest, such as life sciences, only the higher layers need to be adjusted so that the model can accurately analyze images it has never seen before.
The first deep learning system to be widely used in life sciences appeared in 2015. Since then, models with a variety of architectures have emerged as researchers have sought to tackle common bioimage analysis problems, from eliminating noise and improving resolution, to localizing molecules and detecting objects. “A consensus on neural network architectures is starting to emerge,” says Laurène Donati, the executive director of the EPFL Center for Imaging. Meanwhile, Virginie Uhlmann, an EPFL graduate and a research group leader at EMBL-EBI, notes a shift in priorities: “The rush to develop new models has subsided. What really matters now is making sure life sciences researchers know how to use existing technologies properly. Part of that responsibility rests with developers, who need to come together to support their users.”
For scientists without a background in computing, deep learning models can appear impenetrable, especially given the lack of a standardized framework. To get around this problem, platforms known as “model zoos” have been created, hosting collections of pre-trained models along with supporting explanations. While some of these repositories provide only limited information, others offer fully documented examples of research applications, enabling users to judge whether a model can be adapted for a given purpose. But because scientific research intrinsically implies exploring new frontiers, it can be hard to know which model is best suited to a given dataset and how to repurpose it accordingly. Researchers also need to understand the model’s limitations and the factors that could impact its performance, as well as how these factors can be mitigated. And it takes a well-trained eye to avoid bias in interpreting the results.
In their paper, the three authors set out a series of good practices for non-experts, explaining how to choose the right pre-trained model, how to adjust it for a given research application and how to check the validity of the results. In doing so, they hope to “reassure skeptics and provide them with a strategy that minimizes the risks when experimenting with deep learning, and to equip long-time deep learning enthusiasts with additional safeguards,” says Daniel Sage, a researcher in EPFL’s Biomedical Imaging Group. Sage calls for “a stronger sense of community, whereby people share experiences and create a culture of best practices, and closer collaboration between programmers and biologists.”