In this interview series, we’re meeting some of the AAAI/SIGAI Doctoral Consortium participants to find out more about their research. We sat down with Nitay Alon whose research is at the intersection of cognitive science and AI. We talked about the fascinating topic of Theory of Mind, how this plays out in deceptive environments, multi-agent systems, the interdisciplinary nature of this field, when to use Theory of Mind, and when not to, and more.
I’m Nitay, a PhD student just about to graduate from the Hebrew University and Max Planck Institute for Cybernetics. Broadly speaking, my research is related to cognitive science and AI and other domains within the intersection. Humans have an amazing trait called Theory of Mind. It’s almost like a mental superpower. We can use it to try and read other people’s minds. For instance, when we see someone in distress, when we see someone happy, we can try and come up with a plausible explanation as to why this is the case. We also use the same mental superpower to communicate verbally, non-verbally, literally, and figuratively.
Our research asks a somewhat different question: why do we have this superpower from the get-go? What is the situation in which this superpower has evolved? And throughout my research, we found that it is deception that kind of pushes this superpower forward. You don’t really need a lot of Theory of Mind to communicate or coordinate. But when you have a lot of Theory of Mind, when you can really mentalize someone, like “I know that you know that I know that you know”, or “I think that you think that I think that you think, but I know something else”, it really benefits you in deceptive environments. My work started from trying to understand what role Theory of Mind plays in deceptive environments.
The first paper I published showed that agents can use this trait to distort the perception of others. One agent can convince the others that they prefer oranges to apples to get a discount coupon. But if, in turn, other agents are aware of the fact that they’re being deceived or being manipulated, they learn to be skeptical, they learn to discount and distrust. So there is an interesting cognitive arms race.
In our follow-up study, we said, well, if there is this cognitive arms race, then why aren’t we all just infinitely wise or have infinite Theory of Mind? And that paper showed that having too much Theory of Mind or over-mentalization leads to paranoid behavior. That has ramifications for domains like AI safety, but also computational psychiatry. We presented a model that delineates that the same trait that is very beneficial in this deceptive hostile environment can be very harmful when you transition to benign environments.
And then to wrap up the program I’ve tried to come up with a new model, and it’s a recently published paper in the Journal of AI Research, that suggests that we might need to mix Theory of Mind with something that is completely non-mental-model-based, like heuristics. So if I’m interacting with someone and I’m trying to predict their next move and for some reason I keep on failing – “I think you’ll say red, you said green, I think you like vanilla, you prefer chocolate” – then there should be some trigger in my mind that says, “well, I can no longer model you properly, and maybe I’ll just try something that is completely heuristic-based, just play the game as you would play with anyone else”. And that kind of mitigates the problem that too little thinking makes you susceptible for deception, and too much thinking makes you susceptible to paranoid behaviour. The middle ground helps to balance things up.
Yes, it’s very interdisciplinary. I was fortunate to work with very talented people, my advisors from the Hebrew University and the Max Planck Association, but also different collaborators across the world from different domains, computational psychiatry, developmental psychology, economics, multi-agent systems. My research is a reflection of my personal preferences and my nature, which is multidisciplinary.
It’s a long story! So in my undergrad, I travelled around campus a bit. I started in economics and I realized that I really like math, so I started math. And then I realized that I don’t really like math, but I really like statistics. So I ended up doing a double major in statistics and economics. Then I pursued a Master’s in statistics, and I really liked it. I like modeling uncertainty, and thinking about how we quantify uncertainty in the world. Then I worked in industry, which I really liked, but I felt like I needed to do more research. There were still questions that bugged me and haunted me, so I started a PhD.
The spark that ignited my interest in Theory of Mind and deception was an evening that I spent with my older son. He was around four then, and being the child he was, he played around and I told him: “at some point we’ll have to stop and make a decision. What do you want to do next? Do you want to watch some TV or do you want to get a bedtime story?” And I could see that his wheels were spinning around. He was trying to infer why I was asking his preferences. He is a very shrewd child, so he picked up that I was trying to understand which of the two I could use later to threaten him in a parental way to stop him from misbehaving. And he said, “well, you know what, I really dislike TV, bedtime stories are my favourite”. And being the father I am, but also the researcher I am, I was quite surprised. He was able to read my mind pretty quickly and he understood that I was trying to read his preferences, and he was able to provide a very deceitful response by reporting his preferences backward to me.
I shared the story with a colleague, Lion Schulz. He thought it was very interesting that my son was using a lot of Theory of Mind so quickly and reporting false preferences. I pitched the Theory of Mind and deception idea to my advisors and they really liked it. And so I think that my interest in my research is not only a reflection of my background in economics, statistics, and computer science, but also a reflection of the fact that I really like to get research ideas from observing people. It’s not purely theoretical research trying to prove theorems. It’s more trying to understand the mechanisms that govern our behaviour and study what happens when they become maladapted, monitoring, and fixing that maladaptation.
Something I found very interesting was thinking about why humans struggle with recursing to infinity. We know from previous work that when people play competitive games, they think somewhere in the domain of one step, like “I think you think”, or maybe “I think that you think that I think”, but we don’t really recurse to infinity. This is interesting because in game theory, the concepts of Nash equilibria do require that agents would recurse infinitely. And there is this interesting observational contradiction between our capacity as humans and the normative behavior that we expect to have. And, in our paper on the maladaptation of Theory of Mind, we provide an interesting path towards explaining why we don’t recurse to infinity. And it’s because, if you outsmart everybody, then it might be the right solution to a given problem, but then if you attribute this competency to others all the time, a) it’s a cognitive effort (we were able to quantify this mathematically), and b) it leads to a very bad outcome because you always attribute malevolence and competency to others. And sometimes others are just plain simple, they speak the truth, there is no hidden intent, they don’t try to confuse, there is no obfuscation. And if you have this overthinking or overmentalizing, you end up harming yourself.
And I think that this was an interesting outcome for us because there’s a big hype now in the AI industry about where Theory of Mind integrates into large language models and the emergence of Theory of Mind in large language models. And, in a way, we were able to put a few pins on that map and say that it’s very good that they will have some Theory of Mind, but we need to make sure that they don’t have too much Theory of Mind. Nor do we want to claim that they have Theory of Mind, because people might overthink what they’ll do. If you think about the world of agentic AI, which is now the biggest hype, and agents try to infer what other agents are trying to do or achieve, we also need to somehow find a way to regulate this.
So this is my latest work, and I think it’s very exciting because it paves the way to use Theory of Mind (or models of others) when correct or beneficial, but ditch them when they no longer make sense. This is something that I look forward to continuing working on in my future research, because I think that this is an interesting reflection of the way that humans adapt. In some situations, we think strategically and deeply, like when we play complex games or when we are in a competitive situation. But in other cases, like when we’re with friends and family or just going to buy a cup of coffee, we don’t really think about the thoughts of others. We just follow this schema that is predefined and don’t really need to think about others in that situation.
So to some extent, LLMs are everywhere. And I worked with LLMs and I think they are a very exciting tool. But at the end of the day, being agents, they do adhere to the same principles that were laid out when people started thinking about agentic systems in the 40s and the 50s. So to some extent, we have a new tool or a new mechanism that, instead of working in a symbolic language, operates in a natural human language. It scales up fantastically, but at the end of the day, it will still adhere to the same boundaries and issues that the AI community already mapped when thinking about multi-agent systems 30, 40, 50 years ago. Cooperation, honesty, communication, noisy communication, trust, the principal agent problem. Now, a lot of the folks around me, they offload most of their programming tasks to apps like Cloud Code or Copilot. How can you verify that the answer that you’ve gotten is the thing that you were looking for? This is a classic principal agent problem from game theory. If you have multiple agents interacting together, these agentic structures, how do we enforce coordination? These are problems from economics that were hypothesized and solved by the founding fathers von Neumann or Nash, why should we expect LLMs to be different in that sense?
I am just about to defend and submit my PhD thesis, and I already have plans to pursue a postdoc in the U.S. I’m very excited about it and look forward to continuing my research. I just received a big fellowship in Israel, the Rothschild Foundation Fellowship.
My future study will really focus on this idea of adaptive Theory of Mind and resource rationality Theory of Mind. When you think about complex cognitive tasks like reasoning, like Theory of Mind, they are really cognitively demanding. In many sitcoms, they use Theory of Mind to ridicule situations. In Friends, or in Seinfeld, for example, there’s always a scene where someone is like, “do they know that we know that they know that we know?” That’s a very simple recursive Theory of Mind, and no one can track this. My hunch is that humans activate Theory of Mind when we get certain social cues. And I think that part of our ability to adapt our Theory of Mind, and that’s a contrast to my PhD work on maladaptive Theory of Mind, is by meta-learning these social cues. We have a good hunch of situations where we really need to be concentrated and informed, and need to keep track of everything and anything that others are doing. So we learn these environmental cues of when you should activate your Theory of Mind versus when you should just use another mechanism to interact with people. Much like I am attempting in this interview, I’m really thinking about my words, theorizing your mind and the reader’s mind. How would they interpret my words? How would they read between the lines? Am I clear enough? Am I too vague? And this is a very cognitively demanding task. So I would like to explore this.
Part of my future research will be investigating how we learn when to activate Theory of Mind, when to shut it off, and which Theory of Mind is needed in any situation. So it’s on the same path as my previous research, but I’m taking a slightly different route this time.
I’m fortunate to have a very strong network of collaborators. The first Theory of Mind for AI workshop was a year ago in Philadelphia, and we were able to recruit amazing keynote speakers and drew a lot of attention from the community because Theory of Mind is an interesting issue. And particularly today, Theory of Mind in AI is The Thing. This year, in Singapore, we had a rerun of the workshop, different keynote speakers, different audience, same interest, same hype, same enthusiasm in the audience. I think that we are really bridging different communities. Joe Barnby, who is at ECU (Perth) and King’s College London, is a cognitive scientist and brings this perspective into the workshop, which is super valuable. Stefan Sarkadi, who is at King’s College London, but also is now heading a center for defense and cybersecurity in Lincoln, brings other aspects of Theory of Mind, deception, and manipulation. Reuth Mirsky from Tufts brings aspects of human-robot interaction, goal recognition, and multi-agent systems. So I think each and every one of us four organisers brings our own different background and flavor, and the mixture of all these four flavors makes the workshop, if I may say, popular, interesting and relevant. And it attracts a lot of people from the community because we are hitting all these different notes together.
We’re working on a special issue on Theory Mind for AI in the journal Autonomous Agents and Multi-Agent Systems. The idea is to bring people from different communities who work on different aspects of Theory of Mind. In each and every one of the workshops that we had, and we do look forward to our future workshops, we were trying to bring keynote speakers from psychology, from linguistics, from AI, from psychiatry, from planning goal recognition. Each and every one of these communities contributed substantially to our understanding of Theory of Mind, so many other communities can take this contribution and then apply it in their domain. I think that Theory of Mind for AI is here to stay for the coming years.
If anyone is interested in Theory of Mind research, please feel free to approach one of the four of us who organised the workshop. We are always interested in collaborations. It’s an exciting field and a wonderful time to be in Theory of Mind, so I highly encourage anyone who wishes to contribute, ask questions, or learn more.
|
Nitay Alon is a PhD candidate in Computer Science at the Hebrew University of Jerusalem and the MPI for Biological Cybernetics. His research focuses on the intersection of Multi-Agent Reinforcement Learning and Computational Cognitive Science, specifically investigating the role of Theory of Mind (ToM) in mixed-motive games. His doctoral work was advised by Professors Jeff Rosenschein and Peter Dayan. |
Alon’s work is characterized by its application across several disciplines. He used an economic perspective to formalize deception and skepticism within the IPOMDP framework, introducing information-theoretic metrics to quantify strategic belief manipulation. His research in Computational Psychiatry explored k-level cognitive hierarchy models to demonstrate how social dysfunction can stem from excessive recursive reasoning. Additionally, his recent work (published in JAIR) addresses agent robustness by using off-policy counterfactual anomaly detection to mitigate deception in hierarchical settings.
Beyond his research, Alon founded and organizes the ToM4AI Workshop under the AAAI umbrella, where he has chaired international workshops and managed peer-review processes. He is a Guest Editor for the Journal of Autonomous Agents and Multi-Agent Systems (JAAMAS) and co-author of the book Theory of Mind in AI: Foundations, Models, and Ethical Implications (in progress). Alon is an incoming postdoctoral researcher at MIT supported by the 2026 Rothschild Fellowship, he holds an M.Sc. in Statistics and Machine Learning from Tel Aviv University.