about

resources

events

contribute

republishing

☰

ΑΙhub.org

Statistical or embodied? Comparing people and LLMs in their processing of color metaphors: an interview with Douglas Guilbeault

by Ella Scallan

09 June 2026

share this:

We sat down with Douglas Guillbault to discuss his paper, “Comparing Colorseeing, Colorblind, Painters, and Large Language Models in Their Processing of Color Metaphors”. The results have interesting implications for how we model human cognition, and in turn, how the concept of synaesthesia could be integrated to develop more intelligent AI models.

What are color metaphors?

A color metaphor is the use of color to describe something in a way that is not immediately literal. For example, to say “green with envy” would be a color metaphor, because envy doesn’t have an immediate visual structure to it – we’re evoking a broader, more flexible notion of what green conveys, beyond just its visible properties.

What makes metaphors very interesting is that they often use past experience or cultural associations in new ways to talk about something beyond our current perception – either something imagined or in the future, which are many steps of abstraction away from the present. Metaphors provide an alternative pathway to get there.

How can thinking about metaphors be helpful for building LLMs?

Metaphors are important to understand because they’re everywhere in our language. We use many every day without realising that they are metaphors. For example, to say ”getting my message across to you”, where it’s like I’m imagining a message somehow traveling like an object through space, or “grasping an idea”, where I’m imagining being able to physically hold a concept . Those are still metaphors, but we’re very familiar with them.

Large Language Models depend on predicting sequences of words in existing bodies of text, and there’s a fairly strict relationship between what it learns and the data that’s available. Metaphors provide a particularly challenging context for LLMs. Are the LLMs developing a kind of understanding of language that’s similar to humans? Can they go considerably beyond the training data, using past associations and linguistic concepts in flexible ways to talk about something novel?

Color metaphors are a useful way to get at this question, because they are a controlled setting and you can use different cases. There are everyday uses of color words, such as the sky is blue, grass is green. That’s the kind of thing where you would expect a statistical engine like an LLM to be able to learn how often the grass is described as green. There are also metaphors that are going to be well represented in training data, such as anger described as red. You would expect the model to be able to learn that statistical pattern.

Why is color an important concept to study?

There’s a lot of evidence showing that imagery – whether mental, visual, or sensory – is not some epiphenomenal byproduct of cognition. It’s actually instrumental to thinking itself. When you’re reading or when you’re producing speech, the parts of your brain related to imagery are activated, whether you consciously experience it or not.

There’s a foundational skepticism that LLMs could possibly be recovering the nature and richness of visual experience just by learning patterns from words. For the field of embodied cognition, it’s foundational to say that all rich visual experience matters not just for vision, but for our sensory experience in general. It informs everything from our memory and attention to the nature of our concepts. This is what we have in mind when we think about something like color. It has so much more significance beyond just distinguishing things visually.

What kind of methodology did you use to test the embodied cognitive perspective vs the pure numbers hypothesis?

In our paper, we compared LLMs to colorseeing, painters, and colorblind people to show us how far the statistical approach can get us, and what it doesn’t capture. We used a few different paradigms to test this.

In one paradigm, participants had to indicate which color they think a word is. We used words from very familiar domains, like emotions, and we expected there to be patterns in the training data that are learnable, so the statistical approach should do well. We then made it harder – for example, asking what the color of a number is. Here we get a kind of synesthesia, where some people report having really strong associations between colors and numbers, despite there being no clear link.

In another paradigm, we went even further and asked them to associate colors with totally new words that don’t exist. These are pseudo words, which follow all the grammatical structures of the English language and are generated following a standard procedure. There are some ways in which statistics can help you here. For example, one of the words might be “glicker”, and the LLM can recognise this as associated with bright light. If you think about glint, gleam, or glimmer, it actually has a semantic theme relating to light. If the model or colorblind people were to pick up on that, somehow that would skew their statistical association. Or you might see a word like Lord or blodomer, which textually looks like blood, so maybe that will lead you to think that it’s red. Then we would have other nonsense words where it’s increasingly hard to identify a coherent statistical connection.

Why did you choose to compare colorblind people and LLMs in their understanding of color metaphors?

There’s already a large body of research establishing that colorblind people can use language around color appropriately. However, is there something that they are not able to understand because they lack visual experience? Is there something about metaphors that they’re missing, or is it enough to learn the statistical associations? One of the arguments from the area of embodied cognition is that something else in regular cognition is happening above and beyond just the statistical predictions that visually impaired and LLMs use to understand color and language. In this study, we attempted to test this argument – if there is something beyond statistical predictions, you would expect there to be differences between LLM and colorblind people. We did notice quite a few really interesting differences.

What were your results?

One of our results was that all the groups had very strong color associations. Apparently many more people have a kind of synesthesia than we realize, and there’s quite a bit of agreement between them. This is such a robust, foundational phenomenon that we should probably account for it if we’re going to have a good theory of human cognition, let alone AI cognition. To illustrate this – you might not personally feel like a day of the week has a color, but collectively, there is a strong association. We first investigated this in an earlier paper.

What was really striking was that the AI had such strong color associations. I don’t think that this was at all expected by people, and it just emerged – no one built it to have these color associations. If anything, that attests to how robust the synesthesia-like pattern is as a sort of emergent phenomenon, because the AI has itself recreated it.

However, when we compare the actual color associations, we find that the AI often provides color associations that are quite different from both the colorseeing and the colorblind. In fact, the colorseeing and the colorblind were much closer together than the AI. This raises an interesting question. If the colorblind people were just using statistical reasoning, like the previous theories expected, you would expect that they would be much closer to the AIs than the colorseeing. This result is supported by a lot of the existing literature.

The way that we interpret this is that it is an oversimplification to say that colorblind people are just solving these problems through statistics; even if colorblind people are limited in their perception of particular colors, they still are intensely embodied cognitive agents. They have emotions, they have rich sound experience, and they also still have an embodied perception of color. Some of them see a few colors, but even in those who see no colors, there’s really interesting research that shows a color like nature to seeing various gradations of gray scale within a black and white continuum. There’s even evidence that seeing in black and white enhances depth perception, especially at night, so they actually can improve aspects of their visual experience. We argued that there are ways in which the colorblind people are able to learn and recover aspects of sort of embodied reasoning and metaphorical reasoning by importing their actual experience and understanding of emotions and how it connects to other things.

I don’t know if you’ve heard of the Bobo Kiki effect. If you draw like two shapes – one round, the other spiky – and ask people which one is Bobo and which one is Kiki, the vast majority of the time, Kiki is the sharp one and Bobo is the round one. This shows that we have these really strong intuitions of a metaphorical nature, that there’s something about Bobo that feels round. Now the sound itself is not round, but there’s a way in which the O shape that we make with our mouth, and there’s maybe aspects of the wavelength that are a bit rounder. It has sharper boundaries, and it’s sharper visually. There’s a metaphor there, but there is a kind of abstract resonance. Those are the kinds of things that colorblind people still very much have as embodied agents. The Bobo Kiki effect is baked into their minds so they can leverage that style of reasoning, which they use in all kinds of cases to help them in some of these situations.

Could you tell us about the implications of your research?

The conversation between cognitive science and AI has developed to this point where it’s increasingly framed as a tension between competing views. The AI perspective is often called the pure language hypothesis: the view that by learning how to predict, generate, and learn from language data, that these models are going to recover the fundamental richness and the core meanings that relate to human experience and knowledge.

The embodied cognition camp pushes against this view. They argue that what language really is is a communication system designed to drastically oversimplify the richness of our experience so that we can share concepts really quickly. Color is a good example here. The average English vocabulary for color is only 12 words, even though you have immediate experience of the fact that you see many more colors. The vast majority of the time, in order to coordinate and interact, we don’t need to highlight all those subtle grounds.

So the cognitive side is arguing that, at present, the off the shelf statistical frameworks in LLMs, whether Bayesian style or deep learning models, are nowhere close to accounting for those kinds of relationships. ln order to understand what it means for envy to be green, you need to understand something about the universe of meaning attached both to envy and to green and to mappings between them. Statistical associations are very basic and don’t give satisfactory explanations for how things work. LLMs develop quite impressive capabilities to predict, but at the end of the day, they’re just trying to master the prediction of sequences, which is not how the human mind or brain is doing that same process. We are learning through all kinds of other embodied and metaphorical and analogical things.

What future work are you planning in this area? How can this be applied to building better AI models?

This paper is a part of a body of work I’ve been building with collaborators on what we call computational synesthesia. The premise of this is not only that these patterns of synesthesia are a very powerful model organism for understanding cognition, but that they connect things from very different parts of experience. You would think that numbers have no business having a color – by definition, they’re supposed to be the most abstract thing possible. Nevertheless, like we have very strong color associations and it’s not just idiosyncratic. People have color associations whether or not they realise it – you just have to ask them. They have it for days of the week and all kinds of other things.

What that reveals is this ability of the human mind to think very flexibly about one thing in the terms of the other. It’s one of the things about embodied cognition that we think is very powerful and very underappreciated. It’s creativity from constraints. You could view it as a limitation that we had to use our embodied experience to understand the world, and develop theories of physics, social theory, and the mind. On the other hand, it got us to develop a strategy where we became very sophisticated at learning how to talk about one thing in terms of another thing.

We do this kind of thing all the time, and it turns out that that capacity is essentially at the core of creativity. We think this poetic mode of creativity is essential for all kinds of very legitimate, impactful kinds of discovery and science. Many scientists attribute their best ideas to some sort of metaphorical breakthrough. And so we’re just trying to understand that process in a computational way.

This is where the comparison between humans and AI becomes really important. We’re still far from AIs achieving that kind of capacity, but we’re interested in future work to see if there are ways of developing AI to utilise this type of creativity – like using synaesthesia to solve problems. Perhaps we can build them in a way where they rely not just on a visual experience and some sort of auditory experience, but some integration of the two to solve the problem.

We’re still in some combination of theoretical and empirical research, but we have been able to show that it’s not just that humans have these color associations. We’ve been making a lot of progress in how to use Big Data style analyses to reveal pervasive metaphorical structures and computationally describe them. For example, the relationship between concrete or abstract ideas, or the way in which we use metaphors, such as warm or cold, or hard or soft, to talk about concepts. We can learn all these things from computational data.

Now, we’re using AIs as tools for classifying and aggregating data, but in the long run, we’re thinking more and more about how to actually build AI models that have a more robust kind of synesthesia, because we think that that will be key for achieving some of the the creative capabilities that humans possess.

About Douglas Guilbeault

Douglas Guilbeault is an assistant professor of organizational behavior at Stanford’s Graduate School of Business, and he received his PhD from the University of Pennsylvania’s Annenberg School for Communication. He co-directs the Computational Culture Lab, which harnesses and builds computationally intensive network- and language-based methods to study organizational cultures. His work has appeared in a number of top journals, including Nature, PNAS, and Management Science, as well as in popular news outlets, such as The Atlantic and The Harvard Business Review. He has received top research awards from The International Conference on Computational Social Science, The Cognitive Science Society, and The International Communication Association.

tags: AI ideas

Ella Scallan is Assistant Editor for AIhub

AUAI is supported by:

Statistical or embodied? Comparing people and LLMs in their processing of color metaphors: an interview with Douglas Guilbeault

About Douglas Guilbeault

Related posts :

Congratulations to the #ICML2026 award winners

Interactive world simulator for robot policy training and evaluation

#ICML2026 social media round-up

François Pachet on music generation with AI

AI for science – talk recordings now available to watch

AAAI presidential panel – factuality and trustworthiness

The secret to human ‘brilliance’ that AI just can’t match

Pre-training isn’t bitter enough

↑