In this interview series, we’re meeting some of the AAAI/SIGAI Doctoral Consortium participants to find out more about their research. We sat down with Maxime Meyer to chat about his current research, future plans, and how he found the doctoral consortium experience.
Hi, I’m Maxime, a second-year PhD student in the mathematics department at the National University of Singapore. My research focuses on large language models.
One thing people notice with large language models, like ChatGPT, is that they often work well with normal-length prompts, but their answers can get worse when the input becomes very long. For example, if you paste in a 100-page PDF, the model may miss details, get confused, or give less reliable answers. My research focuses on understanding this drop in performance as the input gets longer. I study why it happens, how it develops as the text grows, and whether we can anticipate or limit it.
Models have improved a lot over the last few years. In the past, even one page could be hard for them. Today, a page is usually fine. But very large texts are still a problem. If you wanted to put an entire book into a model—like the Bible—and ask a specific question, it would be too much for current systems. The same issue comes up with long company policy documents or large sets of rules and directives.
One particularly interesting result is that we have recently developed formulas that can predict a model’s performance. Based on a few basic characteristics of a model, we can estimate the maximum input length it can handle reliably. This means we do not always need to run large sets of experiments to find its limits.
Concretely, if a company has a model and wants it to handle longer prompts, they can use these formulas to give guidance right away. By adjusting certain parameters, they can expect the model to be able to process inputs that are two or even three times longer—without having to test every possibility through trial and error.
There are two main directions we’re exploring. First, we want to apply our techniques to other questions about large language models, beyond the specific setting we studied. Second, we are working on sharpening our results to make the predictions more accurate.
This project focused on a problem in quantum computing. A quantum computer is similar to a normal computer, but instead of storing information as clear 0s and 1s, it stores information in quantum states.
In a normal computer, it’s easy to tell whether a bit is a 0 or a 1—you can measure an electrical signal through a wire and read it directly. In a quantum computer, the “bit” is replaced by a quantum state, which can be much more complex. Figuring out exactly which state the system is in is much harder, and that is one of the main challenges in building and using quantum computers.
In our work, we studied how to learn an unknown quantum state step by step, using repeated measurements. We focused on two families of quantum states that are commonly used. One of them has more symmetry, so people expected it to be easier to learn. What we showed is that, in certain settings, that advantage disappears: both families can be equally hard to learn.
It was a great experience. It was my first time presenting my work on LLMs, and I received a lot of useful feedback. The Doctoral Consortium also allowed us to interact closely with established researchers for two days. We talked not only about research, but also about careers, challenges in academia, and what to pay attention to depending on our specific goals. I came away with many valuable insights.
My background is in mathematics. I received an offer from two supervisors I really liked, and I accepted because I knew how important it is to work with people you get along with. We also had the flexibility to define the specific topic later. One of them suggested working on large language models because of how popular they are. I followed his advice, and I’m really enjoying it so far!
I also feel fortunate that my background aligns so well with AI research. Many people in the field come from computer science backgrounds, while researchers with very strong mathematical training often choose to stay in pure mathematics. Yet mathematical skills are appreciated in AI research, so my skillset happens to fit well.
Another advantage of working in AI is that in pure mathematics, it often takes years of study before you can fully understand the state of the art and start contributing. In comparison, AI is more horizontal than vertical. With a solid mathematical background, you can get up to speed relatively quickly and begin working on a variety of research problems.
Mostly make sure you’re in a good environment, you have supervisors you get along with, and you’re in a place you want to live in. Looking around me, it definitely feels like it’s the most important thing that determines whether people enjoy their PhD or have a really rough time for a couple of years. If you’re in a new city with no friends or family, where you don’t enjoy the climate, and then every day you’re working with people you don’t like, it can very quickly become hard. On the contrary, if you have good supervisors and if you are in an environment where you know you can have fun and thrive, a PhD is an incredible experience.
My main hobby is sports and I do quite a lot of boxing. I stopped competing at the start of my PhD, but I still train every day.
|
I’m Maxime, a second-year PhD Student at the Department of Mathematics of the National University of Singapore, under the guidance of Prof. Vincent Tan, and Prof. Caroline Chaux. I am mostly interested in studying the theoretical foundations of large language models. Despite their widespread success, these architectures—and the billions of parameters they rely on—remain poorly understood. My goal is to shed light on the fundamental equations that govern them. What role does each parameter play? How can we quantify the effect of modifying them on model performance? And how can these insights help us design more efficient and interpretable AI models? |