about

resources

events

contribute

republishing

☰

ΑΙhub.org

#AAAI2021 invited talk – Regina Barzilay on deploying machine learning methods in cancer diagnosis and drug design

by Lucy Smith

09 February 2021

Drug discovery

It is well-known that the development of drugs is slow and expensive. Currently, drug discovery is primarily experimentally driven, with properties of molecules investigated empirically. The problem is that the number of molecules that have the potential to be used as drugs is huge, and only a tiny fraction of these will actually be a good candidate. This is where machine learning comes in – it is an ideal tool for assisting in the search through this vast molecule space.

Regina talked about the research methodology that led to the discovery of Halicin for use as an antibiotic. She, and her co-authors, trained a graph neural network model on 2,500 molecules, which they had experimentally tested to see if they were effective against E.coli. For the purposes of the model this gave them a 2d representation of the molecule and a number. This number reflected the inhibitory capacity of the molecule in question. They used the trained model to computationally screen 10⁷ molecules. From this huge number, they were left with 10² candidates to test empirically, then just one candidate to test on animals. You can read more about this work in the Cell paper “A Deep Learning Approach to Antibiotic Discovery”.

The team actually discovered that not only was the drug effective against E.coli, but it also worked on the drug-resistant strains of C.difficile and A.baumannii. The reason Halicin was so effective here was because it had a distinct mechanism of action, as compared to known antibiotics.

Drug discovery slides from Barzilay talk

Cancer diagnosis

Regina talked about breast cancer and began by explaining how classical risk models work. They take in several variables, such as age, family history, and prior medical breast procedures, then apply a simple statistical algorithm. The AUC (area under the curve) metric for these models is around 0.607, so not significantly better than chance (where AUC is 0.5).

One way to improve these models is to include images from mammograms. Regina and her team developed a machine learning model called “Mirai” to predict breast cancer risk based on traditional mammograms. To do this, they collected consecutive screening mammograms from 80,134 patients screened between 1 January 2009 and 31 December 2016. An examination was referred to as “positive” if it was followed by a pathology-confirmed cancer diagnosis within 5 years. The deep learning model was designed to predict risk at multiple timepoints.

You can read the research in full in this paper “Toward robust mammography-based models for breast cancer risk”.

Regina talk race performance for breast cancer slide — Slide showing the performance of the team’s model (MIRAI) versus the standard model used in the USA (Tyrer-Cuzick). She noted the poor performance of the Tyrer-Cuzick model for African American and Asian women.

Regina stressed the need to ensure that models are built and tested using data from all racial groups. She called for standards for researchers to follow when reporting their results. She gave an example of a published paper where there was no racial breakdown of the model performance; as a result we are left none the wiser as to how it performs for different populations. Reproducibility is an issue too; code needs to be made available so that research findings can be rigorously verified.

Personal journey

In the final part of her talk, Regina shed some light on her personal research journey. After being treated for breast cancer in 2014 she wanted to work on problems that could really make a difference to people’s lives, so she changed her focus from natural language processing to applying her machine learning expertise to the field of healthcare. This was not an easy transition and she encountered many challenges along the way, including a struggle to obtain funding, and lack of access to the required data for her research. It was refreshing to hear a researcher talk openly about the challenges they have faced and, if the comments in the chat following the talk were representative, it is something that the audience greatly valued.

AAAI plan to make the recordings of the talks publicly available in a month or two. When the videos are released we will add a link to this article.

tags: AAAI, AAAI2021, Focus on good health and well-being, Focus on UN SDGs

Lucy Smith is Senior Managing Editor for AIhub.

AIhub is supported by:

Introducing the NASA Onboard Artificial Intelligence Research (OnAIR) platform: an interview with Evana Gizzi

Lucy Smith 03 Jul 2025

Find out about the OnAIR platform, some of the particular challenges of deploying AI-based solutions in space, and how the tool has been used so far.

An interview with Nicolai Ommer: the RoboCupSoccer Small Size League

Lucy Smith 01 Jul 2025

We caught up with Nicolai to find out more about the Small Size League, how the auto referees work, and how teams use AI.

Forthcoming machine learning and AI seminars: July 2025 edition

Lucy Smith 30 Jun 2025

A list of free-to-attend AI-related seminars that are scheduled to take place between 1 July and 31 August 2025.

monthly digest

What is vibe coding? A computer scientist explains what it means to have AI write computer code − and what risks that can entail

The Conversation 19 Jun 2025

Until recently, most computer code was written, at least originally, by human beings. But with the advent of GenAI, that has begun to change.

#AAAI2021 invited talk – Regina Barzilay on deploying machine learning methods in cancer diagnosis and drug design

Drug discovery

Cancer diagnosis

Personal journey

Related posts :

Introducing the NASA Onboard Artificial Intelligence Research (OnAIR) platform: an interview with Evana Gizzi

An interview with Nicolai Ommer: the RoboCupSoccer Small Size League

Forthcoming machine learning and AI seminars: July 2025 edition

AIhub monthly digest: June 2025 – gearing up for RoboCup 2025, privacy-preserving models, and mitigating biases in LLMs

RoboCupRescue: an interview with Adam Jacoff

Making optimal decisions without having all the cards in hand

Exploring counterfactuals in continuous-action reinforcement learning

What is vibe coding? A computer scientist explains what it means to have AI write computer code − and what risks that can entail

↑