#NeurIPS2023 invited talk: Linda Smith on young humans and self-generated experience

by Lucy Smith

10 January 2024

share this:

During the first four years of life, children can name and recognise over one thousand object categories, learn the syntax of their language, and absorb the cultural and social properties of where they grew up. By the age of three, they become one-shot learners in many domains. Linda’s research focusses on cognitive development in young children, and she wants to understand the structure of experiences that gives rise to all of the knowledge that a child obtains in such a short time.

To carry out her research, Linda studies the world from the learner’s point of view, by using cameras, audio recorders and motion-tracking sensors to collect data from babies and young children. These sensors have facilitated different projects, including those that focus on recording 24 hours a day, as the child and their family go about their daily routine, and those that are more focussed data-collection sessions which take place in the laboratory.

One of the big research questions is: how do infants learn from such sparse data? In her presentation, Linda talked about three principles of human learning, and gave research examples to illustrate each:

The learner controls the input
There is a constrained curriculum – the data from which we learn is ordered in an important way
The data stream comes in episodes of interconnected experiences.

Linda proposed that, in order to learn rapidly, there must be an alliance between the mechanisms that generate the data, and the mechanisms that do the learning.

Controlling the input

Laboratory experiments have shown that learning input is controlled right from the start of a baby’s life. Infants under five months of age preferentially look at simple edge patterns, that is to say patterns with few edges and orientations, and of high contrast. In experiments carried out in Bloomington, Indiana and in Chennai, India, Linda and her team set out to investigate whether this was true in the real world too. Their results indicated that it was – the young babies in both locations favoured simple patterns, typically architectural features, such as windows, countertops and ceiling fans.

The key point is that the training data for these infants is neither a random nor massive collection of images. It is largely biased towards simple patterns. Linda asked whether this could matter for AI? She pointed to research that has shown that pre-training with baby-like simple edge images yields more rapid learning of adult-level downstream tasks.

A constrained curriculum

As children progress through the early stages of life, their training data changes. The data are not the same for young babies are they are for toddlers. Toddlers are more mobile than babies, and they have different interests, and this leads them to absorb different data. Linda’s research shows that early in infancy faces are the dominant object of interest, with 15 mins of every waking hour spent looking at faces. When children reach 12 months, this changes, and they spend a third of their time looking at hands, specifically hands acting on objects.

Episodes of interconnected experiences

Before the age of one, children have formed associations between the visual features of an object and the sound of the word that represents it. Many of these early associations come from mealtimes, with words like “spoon”, “banana” and “yogurt” being among the earliest learnt. Linda and her team looked at footage that they had collected from head-mounted cameras worn by babies whilst in their homes. They studied data relating to 40 object categories that are known to be recognised early on in a baby’s life. The team focussed on how often these objects were visible at each mealtime setting, and how often the objects were mentioned. Some objects, such as chairs and tables, were visible almost all the time, whereas others, such as pizzas and crackers, were visible much less often. Interestingly, when it came to the object names (i.e. when they were audibly spoken) they occurred quite rarely. So the question they asked was: how are the babies learning the associations?

Linda believes that the answer can be found by looking at what is happening in single episodes, rather than looking at an aggregation of the data. She found that there were single episodes in which an object was present during the mealtime and was named a handful of times. This indicates that a child only needs to experience one episode where a pizza is named to know for the rest of its life what a pizza is.

One area of ongoing research for Linda concerns trying to understand the structure of episodes as time theories of different events. An event that happens at one point in time is going to influence what happens at a later time, creating correlations in space and time. There are bursts of activity, sandwiched between periods of sparsity. The idea is that it is these clusters of activity that create permanent memories of one-time events.

Learning quickly from sparse data

In concluding, Linda returned to the question of how children learn so quickly from sparse data. She believes that the answer lies in the statistics of experience and how data is created by the learner under the physical constraints of space and time. Further research will focus on her theory that the data is both generated and learned by the same internal processes.