In this post, we summarise the first two invited talks from the International Conference on Machine Learning (ICML). These presentations covered the fascinating topics of drug discovery, and the cryosphere.
In Daphne’s talk, she outlined some of the work she has been doing on transforming drug discovery using digital biology.
To introduce the topic, Daphne described drug discovery as an interesting space that one can view as glass half-full or glass half-empty. The half-full version is demonstrated by the amazing advances in new medicines, such as vaccines, cell therapies, genetically targeted therapies, and cancer immunotherapies.
The half-empty version is encapsulated by the exponential decrease in research and development productivity as a function of time. The amortized cost per approved drug is now over $2.5 billion. The aggregate success rate for new drugs is about 5%, and it’s this low success rate that means the overall cost is so high for approved drugs. In the drug discovery journey there are many possible pathways that can be taken at each step of the process and it’s very difficult to predict which pathway will be correct.
This is where machine learning comes in. Daphne and her team pair machine learning methods with creation and collection of high quality data to predict the best path to take. They use machine learning in several parts of the pharmaceutical research and development chain. This includes using machine learning to find suitable drug targets to pursue, to help design the drug molecules, and to assist in a clinical setting once the drug is further down the line. End-to-end learning is used to induce a new representation of the data.
Daphne gave examples of the applications of this methodology for target identification. A drug target is a molecule in the body, usually a protein, that is intrinsically associated with a particular disease process and that could be addressed by a drug to produce a desired therapeutic effect. One of the diseases that Daphne has dedicated much time to studying is non-alcoholic steatohepatitis (NASH), a liver disease that is the progression of non-alcoholic fatty liver disease (NAFLD). Advanced disease leads to cirrhosis and carcinoma. The aim of the research was to uncover genetic drivers of disease progression.
To tackle this problem the team had access to data from clinical trials (conducted by one of their partners) which included a liver biopsy from each patient at the beginning and end of trial. These data were labelled by pathologists and the team used the biopsy images in their machine learning model (a convolutional neural network) to understand the disease progression. The model was able to predict, with very high accuracy, the pathologists’ scores. When it came to identifying genetic drivers, using the model, the team identified two genome-wide significant variants.
Daphne also talked about the processes for drug design itself. The current, standard, procedure for drug design is as follows: researchers look for a protein structure to form their initial design. Once this design is in place, the molecule is synthesised, then tested. The results of the tests are analysed by humans and the next stage of design process follows. The synthesis stage of this process can be particularly time-consuming.
Using a machine-learning approach allows for potential speed-ups in this process. Daphne uses data from a huge number of compounds to build a machine learning model to predict binding affinity of compounds to a protein, as well as other relevant molecular properties. Large compound collections are available for purchase and these can be rapidly tested using high content phenotypic assays. The resulting data points are inputted into the next generation of machine learning models.
Daphne is excited by the future of the discipline she calls digital biology – a combination of data science and biology. The joining of these fields allows us to measure biology in unprecedented ways, interpret what we measure, and action those insights. She believes that it has the potential to transform not only human health, but also biomaterials, agriculture, and many other areas.
Find out more about Daphne’s work here.
The cryosphere is a term for those portions of Earth’s surface where water is in solid form, including glaciers, sea, lake and river ice, snow cover, ice caps, ice sheets, and frozen ground. It plays an important role in earth systems, with its huge fresh water reserves, carbon storage, and unique habitats.
The talk was presented by Xiao Cunde and began with an introduction to the IPCC’s (Intergovernmental Panel on Climate Change) main conclusions on human induced climate change. The cryosphere is a sensitive indicator of climate change, and the impacts of rapid cryospheric changes have been dramatic and far-reaching. As a result, progress in the field of cryospheric science is vital for improving understanding into the cryosphere formation, change mechanisms, and interactions with other earth systems. Changes in the cryosphere influence surface energy and moisture fluxes, precipitation, cloud formation, hydrology, and atmospheric and ocean currents.
Cunde described several studies relating to the cryosphere. One example was sea ice prediction. There are three different routes one can take to model sea ice. The first is statistical modelling. This has the advantage of not requiring much computational power. The disadvantage is that is doesn’t consider the physical mechanisms at play. The second method is dynamical modelling, which does consider the physics, however, it is prone to cumulative error. The final method is machine learning, where the predictive power is very good, but interpretability is poor.
Considering machine learning, different research groups have used different methods. In their paper A fully data-driven method for predicting Antarctic sea ice concentrations using temporal mixture analysis and an autoregressive model, Junhwa Chi and Hyun-Cheol Kim use an autoregressive model with long-short-term memory networks (LSTM). In contrast, in Prediction of monthly Arctic sea ice concentrations using satellite and reanalysis data based on convolutional neural networks, Young Jun Kim et al use random forests and convolutional neural networks (CNNs).
Another case study considered was the estimation of permafrost carbon on the Qinghai-Tibet plateau. Once again, different types of methods have been used by the community. Statistical modelling efforts have included generalised linear models and generalised additive models. On the machine learning side, the methodologies utilised are gradient boosted models and random forests. As inputs, the machine learning models take soil organic carbon datasets. The model is then trained to predict the relationship between soil organic carbon and soil depth, and the spatial distribution of soil organic carbon at depths between 0-1 metre.
Other case studies that Cunde touched on during the talk were paleoclimate reconstruction and stability assessment of outlet glaciers. He concluded by mentioning areas for future study, which include regional socioeconomic planning, resilience building, and tackling the UN sustainable development goals.
Read Xiao Cunde’s biography here.