ΑΙhub.org
 

#ICLR2022 invited talk round-up 2: Beyond interpretability


by
06 May 2022



share this:
Chess board

In the second of our round-ups of the invited talks at the International Conference on Learning Representations (ICLR) we focus on the presentation by Been Kim.

Beyond interpretability: developing a language to shape our relationships with AI
Been Kim

Been Kim’s research focusses on interpretability and explanability of AI models. In this presentation she talked about work towards developing a language to communicate with AI systems. The ultimate goal is that we would be able to query an algorithm as to why a particular decision was made, and it would be able to provide us with an explanation. To illustrate this point, Been used the example of AlphaGo, and the famous match against world champion Lee Sedol. At move 37 in one of the games, AlphaGo produced what commentators described as a “very strange move” that turned the course of the game. It would be nice to know how AlphaGo reached the conclusion to play this move, and that we could query the system in some way to find out.

Been talked about the representational space of humans and machines. Although there is likely to be some overlap, what machines and humans can do is very different. There are many things for which only humans, or only machines have a representation. Move 37 lies in the region that only machines have a representation for. Ideally the two representation spaces would overlap as much as possible. The goal of human-AI language is to increase this overlap.

Been Kim talkScreenshot from Been Kim’s talk.

This language would probably not look like any human language, and we would likely need many different languages depending on the nature of the task in question. For example, construction tasks would require a completely different language to writing tasks. Been noted that, although we don’t yet know what these human-AI languages would look like they should: 1) reflect the nature of the machines (as human languages reflect human nature) and 2) they should expand what we know (just like move 37 expanded the knowledge of Go players). She gave examples of work that she and her colleagues have been carrying out related to both of these aspects.

One piece of work concerns the Gestalt phenomena, whereby humans fill in missing information, for example, by seeing a full triangle when only the three corner parts of the shape are there. Been and colleagues found that machines can also “fill in” the missing information to “see” a triangle, but only after they’ve been trained to generalise.

Another study looked at how humans interact with explanations given by machines. In this case they investigated what happens when the explanations given are not perfect (“noisy”). In an experiment, human participants were playing a computer game in which they had to work as a chef in a busy kitchen, with the goal of preparing meals as quickly as possible. They got varying levels of advice from a machine, sometimes anchored to a high-level goal (for example: “chop tomatoes for the pasta sauce”), sometimes just an instruction (for example: chop tomatoes), sometimes no advice at all. Sometimes they received “noisy” advice. The goal for the team was to work out the impact of noisy explanations. They found that if the explanations were anchored to a higher-level goal, then, even if the advice was noisy, it still helped the human players. However, if the explanation was not anchored, then there was more degradation in performance when the noise was added. What was surprising was that, if used during the training period (not during deployment), noisy, anchored explanations were as good as anchored perfect explanations. This suggests that, even if explanations aren’t perfect, they can still significantly help humans.

Been talked about TCAV (Testing with Concept Activation Vectors), a general tool which provides an interpretation of a neural network’s internal state in terms of human-friendly concepts. The tool aims to find an alignment between a vector in human representational space and a vector in machine representational space. This is done by collecting examples of a concept (for example, “striped”) from humans, and passing these to a machine that is trained to look at images to get its representation of these striped images. This gives a vector that means something to humans represented in the machine space. This can be used to query the machine – for example: is the machine prediction of zebras sensitive to this striped concept?

This TCAV method can be used to help humans expand what we know. One interesting example concerns AlphaZero – the self-trained chess playing model from DeepMind. Been and her team collaborated with a World Chess champion to take a detailed look at AlphaZero and how it works. As the machine was self-trained, it learnt chess by playing against itself, and never saw a human play. This is interesting, because if we can prove that it shares some aspects of learning with humans, it could give us the potential to expand what we know. The research shows that human knowledge is acquired by the AlphaZero neural network as it trains. The team also looked at a broad range of human chess concepts (such as whether to capture the queen, putting an opponent in check), and showed when and where these concepts are represented in the AlphaZero network.

Chess opening moves human vs alphazeroScreenshot from Been Kim’s talk.

Another area of study concerned the opening moves of chess. In the figure above, you can see the evolution of human opening moves (from the many years that chess games have been recorded) versus the opening moves of AlphaZero as it trained. It is interesting to note that AlphaZero uses more diverse range of opening moves than humans. You can check out this interactive visualisation tool from Been and colleagues, that describes concepts from AlphaZero’s representational space. By studying how and what AlphaZero learns we could expand our own chess playing knowledge.

You can read Been’s blog post about her talk here. It includes links to published work on this topic.




tags:


Lucy Smith is Senior Managing Editor for AIhub.
Lucy Smith is Senior Managing Editor for AIhub.

            AIhub is supported by:



Subscribe to AIhub newsletter on substack



Related posts :

Studying the properties of large language models: an interview with Maxime Meyer

  11 Mar 2026
What happens when you increase the prompt length in a LLM? In the latest interview in our AAAI Doctoral Consortium series, we sat down with Maxime, a PhD student in Singapore.

What the Moltbook experiment is teaching us about AI

An experimental social media platform where only AI bots can post reveals surprising lessons about artificial intelligence behaviour and safety.

The malleable mind: context accumulation drives LLM’s belief drift

  09 Mar 2026
LLMs change their "beliefs" over time, depending on the data they are given.

RWDS Big Questions: how do we balance innovation and regulation in the world of AI?

  06 Mar 2026
The panel explores the tensions, trade-offs and practical realities facing policymakers and data scientists alike.

Studying multiplicity: an interview with Prakhar Ganesh

  05 Mar 2026
What is multiplicity, and what implications does it have for fairness, privacy and interpretability in real-world systems?

Top AI ethics and policy issues of 2025 and what to expect in 2026

, and   04 Mar 2026
In the latest issue of AI Matters, a publication of ACM SIGAI, Larry Medsker summarised the year in AI ethics and policy, and looked ahead to 2026.

The greatest risk of AI in higher education isn’t cheating – it’s the erosion of learning itself

  03 Mar 2026
Will AI hollow out the pipeline of students, researchers and faculty that is the basis of today’s universities?

Forthcoming machine learning and AI seminars: March 2026 edition

  02 Mar 2026
A list of free-to-attend AI-related seminars that are scheduled to take place between 2 March and 30 April 2026.



AIhub is supported by:







Subscribe to AIhub newsletter on substack




 















©2026.02 - Association for the Understanding of Artificial Intelligence