#AAAI2021 invited talk – Regina Barzilay on deploying machine learning methods in cancer diagnosis and drug design
In September 2020, Regina Barzilay was announced as the winner of the inaugural AAAI Squirrel AI award. Regina was formally presented with the prize during an award ceremony at the AAAI2021 conference, following which she delivered an invited talk. She spoke about two particular areas of medicine that she has been researching: drug discovery and cancer diagnosis.
It is well-known that the development of drugs is slow and expensive. Currently, drug discovery is primarily experimentally driven, with properties of molecules investigated empirically. The problem is that the number of molecules that have the potential to be used as drugs is huge, and only a tiny fraction of these will actually be a good candidate. This is where machine learning comes in – it is an ideal tool for assisting in the search through this vast molecule space.
Regina talked about the research methodology that led to the discovery of Halicin for use as an antibiotic. She, and her co-authors, trained a graph neural network model on 2,500 molecules, which they had experimentally tested to see if they were effective against E.coli. For the purposes of the model this gave them a 2d representation of the molecule and a number. This number reflected the inhibitory capacity of the molecule in question. They used the trained model to computationally screen 107 molecules. From this huge number, they were left with 102 candidates to test empirically, then just one candidate to test on animals. You can read more about this work in the Cell paper “A Deep Learning Approach to Antibiotic Discovery”.
The team actually discovered that not only was the drug effective against E.coli, but it also worked on the drug-resistant strains of C.difficile and A.baumannii. The reason Halicin was so effective here was because it had a distinct mechanism of action, as compared to known antibiotics.
Regina talked about breast cancer and began by explaining how classical risk models work. They take in several variables, such as age, family history, and prior medical breast procedures, then apply a simple statistical algorithm. The AUC (area under the curve) metric for these models is around 0.607, so not significantly better than chance (where AUC is 0.5).
One way to improve these models is to include images from mammograms. Regina and her team developed a machine learning model called “Mirai” to predict breast cancer risk based on traditional mammograms. To do this, they collected consecutive screening mammograms from 80,134 patients screened between 1 January 2009 and 31 December 2016. An examination was referred to as “positive” if it was followed by a pathology-confirmed cancer diagnosis within 5 years. The deep learning model was designed to predict risk at multiple timepoints.
You can read the research in full in this paper “Toward robust mammography-based models for breast cancer risk”.
Regina stressed the need to ensure that models are built and tested using data from all racial groups. She called for standards for researchers to follow when reporting their results. She gave an example of a published paper where there was no racial breakdown of the model performance; as a result we are left none the wiser as to how it performs for different populations. Reproducibility is an issue too; code needs to be made available so that research findings can be rigorously verified.
In the final part of her talk, Regina shed some light on her personal research journey. After being treated for breast cancer in 2014 she wanted to work on problems that could really make a difference to people’s lives, so she changed her focus from natural language processing to applying her machine learning expertise to the field of healthcare. This was not an easy transition and she encountered many challenges along the way, including a struggle to obtain funding, and lack of access to the required data for her research. It was refreshing to hear a researcher talk openly about the challenges they have faced and, if the comments in the chat following the talk were representative, it is something that the audience greatly valued.
AAAI plan to make the recordings of the talks publicly available in a month or two. When the videos are released we will add a link to this article.