The Machine Learning for Health workshop at NeurIPS 2019 brought together machine learning researchers, clinicians, and healthcare data experts. With the theme “what makes machine learning in medicine different?” the aim was to elucidate the obstacles that make the development of machine learning models for healthcare uniquely challenging.
There were six invited talks as part of the session and these are summarised below along with videos so you can watch the talks in full.
Daphne Koller, insitro
Machine learning: a new approach to drug discovery
We’ve all heard of Moore’s Law, however, Eroom’s Law (the inverse of Moore) is less well-known. It is the observation that there has been an exponential decrease in the number of drugs approved as a function of spending. It currently costs around $2.5bn to get approval for a single drug. As Daphne explained in the introduction to her talk, a major reason for this expense is the number of possible pathways one can explore when investigating a new drug. Almost all of these pathways lead to a dead-end, often after many years of research and millions of dollars of investment. One of the goals of Daphne’s research is to create a “compass” that could help avoid some of these dead-ends and guide us to the correct drug pathway. This kind of problem lends itself very well to machine learning.
At insitro the team set about combining machine learning with a “biological data factory”. Thanks to the rapid advances in iPSCs (induced pluripotent stem cells), CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) and phenotyping they can generate massive amounts of biologically relevant data. In the data factory cells are cultured and differentiated, then perturbed genetically or chemically, before being phenotyped.
This results in a huge amount of cell data (in the form of images) to analyse and the team turn to machine learning, using deep neural networks. These algorithms reduce the dimensionality of the original set of cell images and as a result it is possible to observe an organisation of the data – a low-dimensional cellular phenotypic manifold in a high-dimensional space. With diseases where there is a large amount of data available it is possible to identify clusters in the data. To test their method the team perturbed cells, then used the cellular manifold to see if they could predict the perturbation that was made. Early results were promising with the team able to predict small molecule perturbations with high accuracy. One of the first diseases they are looking at with this method is the fatty liver disease known as nonalcoholic steatohepatitis (NASH).
See the talk from 10:35 in video below (Machine learning for health 1):
Cian Hughes, Google Health UK & Nenad Tomasev, Google Health
A clinically-applicable approach to the continuous prediction of acute kidney injury
Acute kidney injury (AKI) is a sudden drop in kidney function and contributes to 20% of hospital admissions in the UK. An inquiry in 2009 found that AKI was poorly managed, which led to delays in recognition of AKI, delays in access to care and poor management of patients with AKI. In 2014 NHS England took steps to improve matters by introducing a nationally mandated rules-based algorithm which identified patients who had already developed AKI and flagged their clinical record. There was also a national registry set up to help clinicians understand and study the outcomes for patients with AKI. An application called “Streams” was produced which delivered tailored alerts to clinicians. The digital nature of this system sped the process up and meant that there was the potential to save kidney function.
As Cian and Nenad explained, earlier warnings were needed. They wanted to be able to identify patients who were at risk of developing AKI rather than waiting until they had the condition. They turned to machine learning, building a recurrent neural network (RNN) model. To train the model they used data provided as part of a partnership with the US Department of Veterans Affairs, one of the largest integrated health providers in the US. They were able to use anonymised data from around 700,000 patients collected over a period of 15 years.
The results of the work were published in Nature in July 2019. The team report that their model predicts 55.8% of all inpatient episodes of acute kidney injury, and 90.2% of all acute kidney injuries that required subsequent administration of dialysis, with a lead time of up to 48 h and a ratio of 2 false alerts for every true alert. They hope that their methodology will be useful to other researchers investigating other conditions.
See the talk from 4:20 in video below (Machine learning for health 2):
Emily Fox, Apple & University of Washington
Models of cognition: from predicting cognitive impairment to the brain networks underlying complex cognitive processes
Traditionally, health monitoring has taken place during sporadic visits to the doctor. Now, due to the number of devices we use (such as smart phones, smart speakers, wearables) there is potential to transform how we monitor our health. These devices can, for example, monitor our heart rates and provide ECGs. Using machine learning to analyse data from these devices could bring about real benefits to our health and well-being.
In her talk Emily focussed on two particular studies. The first was predicting cognitive impairment using consumer smart devices. The motivation for this study was the fact that early diagnosis of Alzheimer’s disease remains a challenge. Patient-provider interactions are brief, they rely on self-reporting and testing is costly. The question the team asked was: could they remotely, unobtrusively and passively recognise and monitor early signs of cognitive impairment? They used a variety of machine learning techniques to develop a model of app session usage and then classify users into those with cognitive impairment and those without.
The second study involved investigating how brains produce complex cognitive behaviours (CCBs). CCBs arise from interactions between many brain areas. It is thought that different interaction patterns could underlie neurological disorders (such as autism). The team used deep neural networks to study neuroimaging data. Their experiments looked at adult auditory attention. The subjects were played two audio streams concurrently and asked to initially focus on one of the streams. Halfway through they were then either asked to continue listening to the same stream or to switch to the other stream. The team found that the cognitive load was higher for those adults asked to switch.
See the talk from 0:10 in video below (Machine learning for health 3):
Luke Oakden-Rayner, Royal Adelaide Hospital, University of Adelaide
Preventing disasters – why safety is the foundation of medical machine learning
Luke gave a general talk about machine learning in medicine and why it is vital that researchers consider safety as an integral part of planning and designed machine learning systems for healthcare. Because of the high risk involved (i.e. patient well-being at stake), there are concerns that the way in which some machine learning in healthcare is carried out is exposing patients to major harms. He gave the analogy of self-driving cars which are also high-risk, involve a long tail of rare, dangerous outliers, are affected by user error, and where experiments in controlled environments don’t adequately reflect real-world environments. When applied to medicine these factors could lead to a number of problems.
In standard clinical trials, drugs have to pass a number of phases before they are released to the general population. Phase 3 involves “real-world” testing (using on actual patients). For machine learning applications in healthcare there is no requirement to pass this phase 3. Therefore, Luke urged researchers to build safety into their work. There is a need for the community look into these issues.
Luke stressed that if researchers are approaching a classification task (e.g. diagnostics) they should speak to an expert in that area and ask what sort of subsets are present in the data, which subsets are high-risk to the patient, and label that data and test on those labels. Secondly, he recommended that teams producing applications for widespread use in the healthcare system talk to a health economist. They should be considering what is likely to happen to the local hospital network if there is a high uptake of the application.
See the talk from 32:30 in video above (Machine learning for health 3).
Anna Goldenberg, SickKids Research Institute & University of Toronto
Predicting cardiac arrest: design for deployment
In her talk Anna covered the development of a machine learning model for deployment in hospitals with regards to cardiac arrests. Anna and her collaborators collect a huge amount of data (such as heart rate, respiratory rate, blood pressure), via sensors, from patients at their unit. Typically they have a continuous 24 hours of data for each patient. The goal was to use these data, along with machine learning models to try and predict, and hopefully prevent, cases of cardiac arrest.
The team built a convolutional neural network (CNN) + long short-term memory (LSTM) model. You can read more about the details of the methodology here. The model was designed to predict risk of cardiac arrest five to fifteen minutes in advance, giving clinicians time to prepare their team or take action to prevent the event. The clinicians were excited about the model but they and the team were keen to carry out further checks to see if it was ready for deployment. As a next step the team carried out a survey of clinicians to determine what they would like to see from a model. The clinicians thought that having an imperfect model was workable but they needed to know what its limitations were. They also wanted to understand which features drove model outcomes on a patient level and required a view of the time sequence indicating where in that sequence the patient first started deteriorating.
As a result of their collaboration with clinicians the team began work on improving the model with the objective of assessing feature importance at each time point of a risk estimator. To this end they developed a feed forward counterfactual (FFC) model. This research is still a work in progress and the team are planning further investigations and collaborations with clinicians before the technology is deployed in the hospital.
See the talk from 1:25:45 in video above (Machine learning for health 3).
Lily Peng, Google Brain AI & Dale Webster, Google Health
Deep learning and healthcare
Dale and Lily talked about the importance of ensuring that their research could actually be applied in real-world healthcare. They covered three areas: 1) building and evaluating a deep learning system, 2) integrating a deep learning system, 3) deep learning systems for new discoveries.
When starting to build a system it is important to ask: where could deep learning have the biggest impact? One could focus on an area of medicine where there is a shortage of expertise (i.e. lack of doctors). It is also key to pick a field where there is a lot of existing data available. Importantly, any predictions made should be actionable. When building the model itself Dale provided a number of key considerations, including making sure there are consistent labels, sufficient and high-quality training data and that multiple validation data sets are considered.
When it comes to integrating deep learning into a system Lily used the example of diabetic retinopathy, where her team work with partners in India and Thailand to help prevent blindness due to this condition. Pre-deployment, retrospective validation work was carried out (by comparing their model to actual diagnoses from retina specialists) and the team looked into how their model would work on-site (i.e. in the countries in which they were using them). You can read more about the work here. Their model has now gained regulatory approval in Europe.
Finally, Lily talked about using deep learning to make new discoveries. One area where deep learning could be useful is with regards to better understanding of disease progression. This could be achieved by using future outcomes as a label. For example, did a person who had an intermediate disease at screening then go on to develop the full disease one year on.
See the talk from 0:00 in video below (Machine learning for health 4):