about

resources

events

contribute

republishing

☰

ΑΙhub.org

Congratulations to the #ICML2025 award winners!

by Lucy Smith

16 July 2025

Outstanding paper awards

There are six outstanding papers this year:

Conformal Prediction as Bayesian Quadrature
Jake Snell, Thomas Griffiths

Abstract: As machine learning-based prediction systems are increasingly used in high-stakes situations, it is important to understand how such predictive models will perform upon deployment. Distribution-free uncertainty quantification techniques such as conformal prediction provide guarantees about the loss black-box models will incur even when the details of the models are hidden. However, such methods are based on frequentist probability, which unduly limits their applicability. We revisit the central aspects of conformal prediction from a Bayesian perspective and thereby illuminate the shortcomings of frequentist guarantees. We propose a practical alternative based on Bayesian quadrature that provides interpretable guarantees and offers a richer representation of the likely range of losses to be observed at test time.

Read the paper in full here.

Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction
Vaishnavh Nagarajan, Chen Wu, Charles Ding, Aditi Raghunathan

Abstract: We design a suite of minimal algorithmic tasks that are a loose abstraction of open-ended real-world tasks. This allows us to cleanly and controllably quantify the creative limits of the present-day language model.Much like real-world tasks that require a creative, far-sighted leap of thought, our tasks require an implicit, open-ended stochastic planning step that either (a) discovers new connections in an abstract knowledge graph (like in wordplay, drawing analogies, or research) or (b) constructs new patterns (like in designing math problems or new proteins). In these tasks, we empirically and conceptually argue how next-token learning is myopic and memorizes excessively; multi-token approaches, namely teacherless training and diffusion models, comparatively excel in producing diverse and original output. Secondly, to elicit randomness without hurting coherence, we find that injecting noise at the input layer (dubbed seed-conditioning) works surprisingly as well as (and in some conditions, better than) temperature sampling from the output layer. Thus, our work offers a principled, minimal test-bed for analyzing open-ended creative skills, and offers new arguments for going beyond next-token learning and temperature sampling. We make part of the code available here.

Read the paper in full here.

The Value of Prediction in Identifying the Worst-Off
Unai Fischer Abaigar, Christoph Kern, Juan Perdomo

Abstract: Machine learning is increasingly used in government programs to identify and support the most vulnerable individuals, prioritizing assistance for those at greatest risk over optimizing aggregate outcomes. This paper examines the welfare impacts of prediction in equity-driven contexts, and how they compare to other policy levers, such as expanding bureaucratic capacity. Through mathematical models and a real-world case study on long-term unemployment amongst German residents, we develop a comprehensive understanding of the relative effectiveness of prediction in surfacing the worst-off. Our findings provide clear analytical frameworks and practical, data-driven tools that empower policymakers to make principled decisions when designing these systems.

Read the paper in full here.

Train for the Worst, Plan for the Best: Understanding Token Ordering in Masked Diffusions
Jaeyeon Kim, Kulin Shah, Vasilis Kontonis, Sham Kakade, Sitan Chen

Abstract: In recent years, masked diffusion models (MDMs) have emerged as a promising alternative approach for generative modeling over discrete domains. Compared to autoregressive models (ARMs), MDMs trade off complexity at training time with flexibility at inference time. At training time, they must learn to solve an exponentially large number of infilling problems, but at inference time, they can decode tokens in essentially arbitrary order. In this work we closely examine these two competing effects. On the training front, we theoretically and empirically demonstrate that MDMs indeed train on computationally intractable subproblems compared to their autoregressive counterparts. On the inference front, we show that a suitable strategy for adaptively choosing the token decoding order significantly enhances the capabilities of MDMs, allowing them to sidestep hard subproblems. On logic puzzles like Sudoku, we show that adaptive inference can boost solving accuracy in pretrained MDMs from <7 % to ≈90%, even outperforming ARMs that were explicitly trained via teacher forcing to learn the right order of decoding. Read the paper in full here.

Score Matching with Missing Data
Josh Givens, Song Liu, Henry Reeve

Abstract: Score matching is a vital tool for learning the distribution of data with applications across many areas including diffusion processes, energy based modelling, and graphical model estimation. Despite all these applications, little work explores its use when data is incomplete. We address this by adapting score matching (and its major extensions) to work with missing data in a flexible setting where data can be partially missing over any subset of the coordinates. We provide two separate score matching variations for general use, an importance weighting (IW) approach, and a variational approach. We provide finite sample bounds for our IW approach in finite domain settings and show it to have especially strong performance in small sample lower dimensional cases. Complementing this, we show our variational approach to be strongest in more complex high-dimensional settings which we demonstrate on graphical model estimation tasks on both real and simulated data.

Read the paper in full here.

CollabLLM: From Passive Responders to Active Collaborators
Shirley Wu, Michel Galley, Baolin Peng, Hao Cheng, Gavin Li, Yao Dou, Weixin Cai, James Zou, Jure Leskovec, Jianfeng Gao

Abstract: Large Language Models are typically trained with next-turn rewards, limiting their ability to optimize for long-term interaction. As a result, they often respond passively to ambiguous or open-ended user requests, failing to help users reach their ultimate intents and leading to inefficient conversations. To address these limitations, we introduce CollabLLM, a novel and general training framework that enhances multiturn human-LLM collaboration. Its key innovation is a collaborative simulation that estimates the long-term contribution of responsesusing Multiturn-aware Rewards. By reinforcement fine-tuning these rewards, CollabLLM goes beyond responding to user requests, and actively uncovers user intent and offers insightful suggestions—a key step towards more human-centered AI. We also devise a multiturn interaction benchmark with three challenging tasks such as document creation. CollabLLM significantly outperforms our baselines with averages of 18.5% higher task performance and 46.3% improved interactivity by LLM judges. Finally, we conduct a large user study with 201 judges, where CollabLLM increases user satisfaction by 17.6% and reduces user spent time by 10.4%.

Read the paper in full here.

Outstanding position paper awards

There were two winners in the position paper category this year:

Position: AI Safety should prioritize the Future of Work
Sanchaita Hazra, Bodhisattwa Prasad Majumder, Tuhin Chakrabarty

Abstract: Current efforts in AI safety prioritize filtering harmful content, preventing manipulation of human behavior, and eliminating existential risks in cybersecurity or biosecurity. While pressing, this narrow focus overlooks critical human-centric considerations that shape the long-term trajectory of a society. In this position paper, we identify the risks of overlooking the impact of AI on the future of work and recommend comprehensive transition support towards the evolution of meaningful labor with human agency. Through the lens of economic theories, we highlight the intertemporal impacts of AI on human livelihood and the structural changes in labor markets that exacerbate income inequality. Additionally, the closed-source approach of major stakeholders in AI development resembles rent-seeking behavior through exploiting resources, breeding mediocrity in creative labor, and monopolizing innovation. To address this, we argue in favor of a robust international copyright anatomy supported by implementing collective licensing that ensures fair compensation mechanisms for using data to train AI models. We strongly recommend a pro-worker framework of global AI governance to enhance shared prosperity and economic justice while reducing technical debt.

Read the paper in full here.

Position: The AI Conference Peer Review Crisis Demands Author Feedback and Reviewer Rewards
Jaeho Kim, Yunseok Lee, Seulki Lee

Abstract: The peer review process in major artificial intelligence (AI) conferences faces unprecedented challenges with the surge of paper submissions (exceeding 10,000 submissions per venue), accompanied by growing concerns over review quality and reviewer responsibility. This position paper argues for the need to transform the traditional one-way review system into a bi-directional feedback loop where authors evaluate review quality and reviewers earn formal accreditation, creating an accountability framework that promotes a sustainable, high-quality peer review system. The current review system can be viewed as an interaction between three parties: the authors, reviewers, and system (i.e., conference), where we posit that all three parties share responsibility for the current problems. However, issues with authors can only be addressed through policy enforcement and detection tools, and ethical concerns can only be corrected through self-reflection. As such, this paper focuses on reforming reviewer accountability with systematic rewards through two key mechanisms: (1) a two-stage bi-directional review system that allows authors to evaluate reviews while minimizing retaliatory behavior, (2) a systematic reviewer reward system that incentivizes quality reviewing. We ask for the community’s strong interest in these problems and the reforms that are needed to enhance the peer review process.

Read the paper in full here.

Test-of-time award

The test-of-time award celebrates the lasting impact a paper has had over the past ten years. For this year’s award paper that were presented at ICML 2015 have been considered. There is one winner, and two honourable mentions.

Winner

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
Sergey Ioffe, Christian Szegedy

Abstract: Training Deep Neural Networks is complicated by the fact that the distribution of each layer’s inputs changes during training, as the parameters of the previous layers change. This slows down the training by requiring lower learning rates and careful parameter initialization, and makes it notoriously hard to train models with saturating nonlinearities. We refer to this phenomenon as internal covariate shift, and address the problem by normalizing layer inputs. Our method draws its strength from making normalization a part of the model architecture and performing the normalization for each training mini-batch. Batch Normalization allows us to use much higher learning rates and be less careful about initialization. It also acts as a regularizer, in some cases eliminating the need for Dropout. Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin. Using an ensemble of batch-normalized networks, we improve upon the best published result on ImageNet classification: reaching 4.9% top-5 validation error (and 4.8% test error), exceeding the accuracy of human raters.

Read the paper in full here.

Honourable mention

Trust Region Policy Optimization
John Schulman, Sergey Levine, Philipp Moritz, Michael I. Jordan, Pieter Abbeel

Abstract: We describe an iterative procedure for optimizing policies, with guaranteed monotonic improvement. By making several approximations to the theoretically-justified procedure, we develop a practical algorithm, called Trust Region Policy Optimization (TRPO). This algorithm is similar to natural policy gradient methods and is effective for optimizing large nonlinear policies such as neural networks. Our experiments demonstrate its robust performance on a wide variety of tasks: learning simulated robotic swimming, hopping, and walking gaits; and playing Atari games using images of the screen as input. Despite its approximations that deviate from the theory, TRPO tends to give monotonic improvement, with little tuning of hyperparameters.

Read the paper in full here.

Honourable mention

Variational Inference with Normalizing Flows
Danilo Jimenez Rezende, Shakir Mohamed

Abstract: The choice of approximate posterior distribution is one of the core problems in variational inference. Most applications of variational inference employ simple families of posterior approximations in order to allow for efficient inference, focusing on mean-field or other simple structured approximations. This restriction has a significant impact on the quality of inferences made using variational methods. We introduce a new approach for specifying flexible, arbitrarily complex and scalable approximate posterior distributions. Our approximations are distributions constructed through a normalizing flow, whereby a simple initial density is transformed into a more complex one by applying a sequence of invertible transformations until a desired level of complexity is attained. We use this view of normalizing flows to develop categories of finite and infinitesimal flows and provide a unified view of approaches for constructing rich posterior approximations. We demonstrate that the theoretical advantages of having posteriors that better match the true posterior, combined with the scalability of amortized variational approaches, provides a clear improvement in performance and applicability of variational inference.

Read the paper in full here.

tags: ICML, ICML2025

Lucy Smith is Senior Managing Editor for AIhub.

AIhub is supported by:

Smart microscope captures aggregation of misfolded proteins

EPFL 07 Aug 2025

EPFL researchers have developed a microscope that can predict the onset of misfolded protein aggregation.

Interview with Shaghayegh (Shirley) Shajarian: Applying generative AI to computer networks

Lucy Smith 05 Aug 2025

Read the latest interview in our series featuring the AAAI/SIGAI Doctoral Consortium participants.

How AI can help protect bees from dangerous parasites

The Conversation 04 Aug 2025

Tiny but mighty, honeybees play a crucial role in our ecosystems, pollinating various plants and crops.

The Machine Ethics podcast: AI Ethics, Risks and Safety Conference 2025

The Machine Ethics Podcast 01 Aug 2025

Listen to a special episode recorded at the AI Ethics, Risks and Safety Conference.

Interview with Aneesh Komanduri: Causality and generative modeling

Lucy Smith 31 Jul 2025

Read the latest interview in our series featuring the AAAI/SIGAI Doctoral Consortium participants.

monthly digest

AIhub monthly digest: July 2025 – RoboCup round-up, ICML in Vancouver, and leveraging feedback in human-robot interactions

Lucy Smith 30 Jul 2025

Welcome to our monthly digest, where you can catch up with AI research, events and news from the month past.

Interview with Yuki Mitsufuji: Text-to-sound generation

Lucy Smith 29 Jul 2025

We hear from Sony AI Lead Research Scientist Yuki Mitsufuji to find out more about his latest research.

Open-source Swiss language model to be released this summer

EPFL 29 Jul 2025

This summer, EPFL and ETH Zurich will release a large language model (LLM) developed on public infrastructure.

Congratulations to the #ICML2025 award winners!

Outstanding paper awards

Outstanding position paper awards

Test-of-time award

Winner

Honourable mention

Honourable mention

Related posts :

Smart microscope captures aggregation of misfolded proteins

Interview with Shaghayegh (Shirley) Shajarian: Applying generative AI to computer networks

How AI can help protect bees from dangerous parasites

The Machine Ethics podcast: AI Ethics, Risks and Safety Conference 2025

Interview with Aneesh Komanduri: Causality and generative modeling

AIhub monthly digest: July 2025 – RoboCup round-up, ICML in Vancouver, and leveraging feedback in human-robot interactions

Interview with Yuki Mitsufuji: Text-to-sound generation

Open-source Swiss language model to be released this summer

↑