ΑΙhub.org
 

#NeurIPS2022 outstanding paper – Gradient descent: the ultimate optimizer


by
30 November 2022



share this:
illustration of SGD using turtles to represent different methods

Kartik Chandra, Audrey Xie, Jonathan Ragan-Kelley and Erik Meijer won a NeurIPS 2022 outstanding paper award for their work Gradient descent: the ultimate optimizer. Here, they tell us more about their work, the methodology and their main findings.

What is the topic of the research in your paper?

Our paper studies the classic problem of “hyperparameter optimization”.

Nearly all of today’s machine learning algorithms use a process called “stochastic gradient descent” (SGD) to train neural networks. SGD requires users to pick certain settings, or “hyperparameters,” before running it. Just like baking a cake requires you to pick an oven temperature and cooking time, running SGD requires you to pick hyperparameters like the “step size” and “momentum.” And just like in baking, the best settings can be hard to find, even after a lot of trial and error.

Our paper shows how SGD can *itself* be used to intelligently find good SGD hyperparameters. This is not a new idea, but our method is highly practical and easier to use than existing methods, and it generalizes straightforwardly to many popular SGD variants.

We also address the natural follow-up question, “don’t you still need to pick hyperparameters for the SGD that picks the hyperparameters”? Our answer: we can keep “stacking” SGD recursively: each new level trains the previous level’s hyperparameters! As the SGD tower grows taller, the top-level human-picked hyperparameters matter less and less.

Could you tell us about the implications of your research and why it is an interesting area for study?

Our work offers a way to dramatically simplify one of the most frustrating tasks in machine learning research: picking hyperparameters. Because our method replaces the costly trial-and-error required to find good hyperparameters, we also hope it can help cut down on the amount of computation and energy needed to train neural networks – and thus the ecological impact of AI research.

Could you explain your methodology?

Our method works by making a subtle modification to the famous “backpropagation” algorithm, so that it can train not only the neural network, but also the SGD hyperparameters operating on that neural network. This idea lends itself to an elegant implementation that lets us “eat our own tail” and repeatedly stack more and more SGDs on top of each other.

What were your main findings?

Our main finding was that our method recovers good hyperparameters across a wide range of tasks and SGD variants. We tested it on several benchmarks, including popular neural network architectures used in computer vision (CV) and natural language processing (NLP), and observed that even if we picked “bad” initial hyperparameters our method would recover and perform about as well as “good” hyperparameters. This robustness increased as we made the SGD stacks taller.

One particularly striking result was that our method could intelligently vary hyperparameters over time, in a way that closely matched “schedules” designed by expert ML researchers.

What further work are you planning in this area?

We are now working on extending this method to work for hyperparameters used in other kinds of AI algorithms, such as in robotics.

About the authors

Kartik photo

Kartik Chandra is a PhD student at MIT. He is supported by the Hertz Foundation, the Paul & Daisy Fellowship for New Americans, and the National Science Foundation.

Audrey Xie is a third-year undergraduate student at MIT studying computer science and mathematics.

Jonathan Ragan-Kelley is the Esther and Harold E. Edgerton Assistant Professor of Electrical Engineering & Computer Science at MIT.

Erik Meijer is a Dutch computer scientist best known for his work on Haskell, C#, Visual Basic, and Dart, as well as for his contributions to LINQ and the Reactive Framework (Rx).



tags: ,


AIhub is dedicated to free high-quality information about AI.
AIhub is dedicated to free high-quality information about AI.

            AIhub is supported by:



Subscribe to AIhub newsletter on substack



Related posts :

The Good Robot podcast: what makes a drone “good”? with Beryl Pong

  20 Feb 2026
In this episode, Eleanor and Kerry talk to Beryl Pong about what it means to think about drones as “good” or “ethical” technologies.

Relational neurosymbolic Markov models

and   19 Feb 2026
Relational neurosymbolic Markov models make deep sequential models logically consistent, intervenable and generalisable

AI enables a Who’s Who of brown bears in Alaska

  18 Feb 2026
A team of scientists from EPFL and Alaska Pacific University has developed an AI program that can recognize individual bears in the wild, despite the substantial changes that occur in their appearance over the summer season.

Learning to see the physical world: an interview with Jiajun Wu

and   17 Feb 2026
Winner of the 2019 AAAI / ACM SIGAI dissertation award tells us about his current research.

3 Questions: Using AI to help Olympic skaters land a quint

  16 Feb 2026
Researchers are applying AI technologies to help figure skaters improve. They also have thoughts on whether five-rotation jumps are humanly possible.

AAAI presidential panel – AI and sustainability

  13 Feb 2026
Watch the next discussion based on sustainability, one of the topics covered in the AAAI Future of AI Research report.

How can robots acquire skills through interactions with the physical world? An interview with Jiaheng Hu

  12 Feb 2026
Find out more about work published at the Conference on Robot Learning (CoRL).

From Visual Question Answering to multimodal learning: an interview with Aishwarya Agrawal

and   11 Feb 2026
We hear from Aishwarya about research that received a 2019 AAAI / ACM SIGAI Doctoral Dissertation Award honourable mention.



AIhub is supported by:







Subscribe to AIhub newsletter on substack




 















©2026.02 - Association for the Understanding of Artificial Intelligence