ΑΙhub.org
 

Why ChatGPT struggles with math


by
07 November 2024



share this:

Have you ever tried to use an AI tool like ChatGPT to do some math and found it doesn’t always add up? It turns out there’s a reason for that.

As large language models (LLMs) like OpenAI’s ChatGPT become more ubiquitous, people increasingly rely on them for work and research assistance. Yuntian Deng, assistant professor at the David R. Cheriton School of Computer Science, discusses some of the challenges in LLMs’ reasoning capabilities, particularly in math, and explores the implications of using these models to aid problem-solving.

What flaw did you discover in ChatGPT’s ability to do math?

As I explained in a recent post on X, the latest reasoning variant of ChatGPT o1, struggles with large-digit multiplication, especially when multiplying numbers beyond nine digits. This is a notable improvement over the previous ChatGPT-4o model, which struggled even with four-digit multiplication, but it’s still a major flaw.

What implications does this have regarding the tool’s ability to reason?

Large-digit multiplication is a useful test of reasoning because it requires a model to apply principles learned during training to new test cases. Humans can do this naturally. For instance, if you teach a high school student how to multiply nine-digit numbers, they can easily extend that understanding to handle ten-digit multiplication, demonstrating a grasp of the underlying principles rather than mere memorization.

In contrast, LLMs often struggle to generalize beyond the data they have been trained on. For example, if an LLM is trained on data involving multiplication of up to nine-digit numbers, it typically cannot generalize to ten-digit multiplication.

As LLMs become more powerful, their impressive performance on challenging benchmarks can create the perception that they can “think” at advanced levels. It’s tempting to rely on them to solve novel problems or even make decisions. However, the fact that even o1 struggles with reliably solving large-digit multiplication problems indicates that LLMs still face challenges when asked to generalize to new tasks or unfamiliar domains.

Why is it important to study how these LLMs “think”?

Companies like OpenAI haven’t fully disclosed the details of how their models are trained or the data they use. Understanding how these AI models operate allows researchers to identify their strengths and limitations, which is essential for improving them. Moreover, knowing these limitations helps us understand which tasks are best suited for LLMs and where human expertise is still crucial.




University of Waterloo




            AIhub is supported by:


Related posts :



monthly digest

AIhub monthly digest: March 2025 – human-allied AI, differential privacy, and social media microtargeting

  28 Mar 2025
Welcome to our monthly digest, where you can catch up with AI research, events and news from the month past.

AI ring tracks spelled words in American Sign Language

  27 Mar 2025
In its current form, SpellRing could be used to enter text into computers or smartphones via fingerspelling.

How AI images are ‘flattening’ Indigenous cultures – creating a new form of tech colonialism

  26 Mar 2025
AI-generated stock images that claim to depict “Indigenous Australians”, don’t resemble Aboriginal and Torres Strait Islander peoples.

Interview with Lea Demelius: Researching differential privacy

  25 Mar 2025
We hear from doctoral consortium participant Lea Demelius who is investigating the trade-offs and synergies that arise between various requirements for trustworthy AI.

The Machine Ethics podcast: Careful technology with Rachel Coldicutt

This episode, Ben chats to Rachel Coldicutt about AI taxonomy, innovating for everyone not just the few, responsibilities of researchers, and more.

Interview with AAAI Fellow Roberto Navigli: multilingual natural language processing

  21 Mar 2025
Roberto tells us about his career path, some big research projects he’s led, and why it’s important to follow your passion.

Museums have tons of data, and AI could make it more accessible − but standardizing and organizing it across fields won’t be easy

  20 Mar 2025
How can AI models help organize large amounts of data from different collections, and what are the challenges?

Shlomo Zilberstein wins the 2025 ACM/SIGAI Autonomous Agents Research Award

  19 Mar 2025
Congratulations to Shlomo Zilberstein on winning this prestigious award!




AIhub is supported by:






©2024 - Association for the Understanding of Artificial Intelligence


 












©2021 - ROBOTS Association