ΑΙhub.org
 

Researching more data efficient machine learning models


by
12 October 2023



share this:
abstract image - blue blocks in a wavy grid

By Sarah Collins

Researchers have developed a machine learning algorithm that can model complex equations in real-world situations while using far less training data than is normally expected.

The researchers, from the University of Cambridge and Cornell University, found that for partial differential equations – a class of physics equations that describe how things in the natural world evolve in space and time – machine learning models can produce reliable results even when they are provided with limited data.

Their results, reported in the Proceedings of the National Academy of Sciences, could be useful for constructing more time- and cost-efficient machine learning models for applications such as engineering and climate modelling.

Most machine learning models require large amounts of training data before they can begin returning accurate results. Traditionally, a human will annotate a large volume of data – such as a set of images, for example – to train the model.

“Using humans to train machine learning models is effective, but it’s also time-consuming and expensive,” said first author Dr Nicolas Boullé. “We’re interested to know exactly how little data we actually need to train these models and still get reliable results.”

Other researchers have been able to train machine learning models with a small amount of data and get excellent results, but how this was achieved has not been well-explained. For their study, Boullé and his co-authors, Diana Halikias and Alex Townsend from Cornell University, focused on partial differential equations (PDEs).

“PDEs are like the building blocks of physics: they can help explain the physical laws of nature, such as how the steady state is held in a melting block of ice,” said Boullé. “Since they are relatively simple models, we might be able to use them to make some generalisations about why these AI techniques have been so successful in physics.”

The researchers found that PDEs that model diffusion have a structure that is useful for designing AI models. “Using a simple model, you might be able to enforce some of the physics that you already know into the training data set to get better accuracy and performance,” said Boullé.

The researchers constructed an efficient algorithm for predicting the solutions of PDEs under different conditions by exploiting the short and long-range interactions happening. This allowed them to build some mathematical guarantees into the model and determine exactly how much training data was required to end up with a robust model.

“It depends on the field, but for physics, we found that you can actually do a lot with a very limited amount of data,” said Boullé. “It’s surprising how little data you need to end up with a reliable model. Thanks to the mathematics of these equations, we can exploit their structure to make the models more efficient.”

The researchers say that their techniques will allow data scientists to open the ‘black box’ of many machine learning models and design new ones that can be interpreted by humans, although future research is still needed.

“We need to make sure that models are learning the right things, but machine learning for physics is an exciting field – there are lots of interesting maths and physics questions that AI can help us answer,” said Boullé.

Read the research in full

Elliptic PDE learning is provably data-efficient, Nicolas Boullé, Diana Halikias, and Alex Townsend, PNAS (2023).




University of Cambridge




            AIhub is supported by:


Related posts :



Exploring counterfactuals in continuous-action reinforcement learning

  20 Jun 2025
Shuyang Dong writes about her work that will be presented at IJCAI 2025.

What is vibe coding? A computer scientist explains what it means to have AI write computer code − and what risks that can entail

  19 Jun 2025
Until recently, most computer code was written, at least originally, by human beings. But with the advent of GenAI, that has begun to change.

Gearing up for RoboCupJunior: Interview with Ana Patrícia Magalhães

  18 Jun 2025
We hear from the organiser of RoboCupJunior 2025 and find out how the preparations are going for the event.

Interview with Mahammed Kamruzzaman: Understanding and mitigating biases in large language models

  17 Jun 2025
Find out how Mahammed is investigating multiple facets of biases in LLMs.

Google’s SynthID is the latest tool for catching AI-made content. What is AI ‘watermarking’ and does it work?

  16 Jun 2025
Last month, Google announced SynthID Detector, a new tool to detect AI-generated content.

The Good Robot podcast: Symbiosis from bacteria to AI with N. Katherine Hayles

  13 Jun 2025
In this episode, Eleanor and Kerry talk to N. Katherine Hayles about her new book, and discuss how the biological concept of symbiosis can inform the relationships we have with AI.

Preparing for kick-off at RoboCup2025: an interview with General Chair Marco Simões

  12 Jun 2025
We caught up with Marco to find out what exciting events are in store at this year's RoboCup.

Graphic novel explains the environmental impact of AI

  11 Jun 2025
EPFL’s Center for Learning Sciences has released Utop’IA, an educational graphic novel that explores the environmental impact of artificial intelligence.



 

AIhub is supported by:






©2025.05 - Association for the Understanding of Artificial Intelligence


 












©2025.05 - Association for the Understanding of Artificial Intelligence