about

resources

events

contribute

republishing

☰

ΑΙhub.org

deep dive

Defending against prompt injection with structured queries (StruQ) and preference optimization (SecAlign)

BAIR blog 06 May 2025

Recent advances in LLMs enable exciting LLM-integrated applications. However, as LLMs have improved, so have the attacks against them.

Copilot Arena: A platform for code

ML@CMU 28 Apr 2025

Copilot Arena is an app designed to evaluate LLMs in real-world settings by collecting preferences directly in a developer’s actual workflow.

Repurposing protein folding models for generation with latent diffusion

BAIR blog 14 Apr 2025

The awarding of the 2024 Nobel Prize to AlphaFold2 marks an important moment of recognition for the of AI role in biology. What comes next after protein folding?

Optimizing LLM test-time compute involves solving a meta-RL problem

ML@CMU 20 Jan 2025

By altering the LLM training objective, we can reuse existing data along with more test-time compute to train models to do better.

Generating a biomedical knowledge graph question answering dataset

Xi Yan 17 Jan 2025

Introducing PrimeKGQA - a scalable approach to dataset generation, harnessing the power of large language models.

Human-AI collaboration in physical tasks

ML@CMU 03 Jan 2025

Creating an intelligent assistant that uses the sensors in a smartwatch to support physical tasks such as cooking and DIY.

Improving calibration by relating focal loss, temperature scaling, and properness

Viacheslav Komisarenko 28 Nov 2024

In ML classification tasks, achieving high accuracy is only part of the goal; it's equally important for models to express how confident they are in their predictions.

AI in cancer research & care: perspectives of three KU Leuven institutes

Leuven.AI Stories, Evy Lobbestael, Jens Bürger and Liesl Jacobs 26 Nov 2024

This story is a collaboration of three Institutes that are working at the intersection of cell research, cancer research and care, and artificial intelligence.

Identification of hazardous areas for priority landmine clearance: AI for humanitarian mine action

ML@CMU 19 Nov 2024

In close collaboration with the UN and local NGOs, we co-develop an interpretable predictive tool to identify hazardous clusters of landmines.

Enhancing controlled query evaluation through epistemic policies

Gianluca Cima, Domenico Lembo, Lorenzo Marconi, Riccardo Rosati and Domenico Fabio Savo 12 Nov 2024

The winners of an IJCAI2024 best paper award explain the key advances of their work.

VQAScore: Evaluating and improving vision-language generative models

ML@CMU 06 Nov 2024

We introduce a new evaluation metric and benchmark dataset for automated evaluation of text-to-visual generative models.

No free lunch in LLM watermarking: Trade-offs in watermarking design choices

ML@CMU 23 Oct 2024

Common design choices in LLM watermarking schemes make the resulting systems surprisingly susceptible to watermark removal or spoofing attacks.

Linguistic bias in ChatGPT: Language models reinforce dialect discrimination

BAIR blog 30 Sep 2024

Examining how ChatGPT’s behavior changes in response to text in different varieties of English.

Rethinking LLM memorization

ML@CMU 23 Sep 2024

Our approach provides a simple and practical perspective on what memorization can mean, providing a useful tool for functional and legal analysis of LLMs.

How to evaluate jailbreak methods: a case study with the StrongREJECT benchmark

BAIR blog 09 Sep 2024

Providing a more accurate assessment of jailbreak effectiveness.

Proportional aggregation of preferences for sequential decision making

Nikhil Chandak and Shashwat Goel 27 Aug 2024

Read about work that won an outstanding paper award at AAAI 2024.

CMU-MATH team’s innovative approach secures 2nd place at the AIMO prize

ML@CMU 19 Aug 2024

Dive into our blog to discover the winning formula.

Are we ready for multi-image reasoning? Launching VHs: The Visual Haystacks benchmark!

BAIR blog 01 Aug 2024

This project focuses on the “Multi-Image Question Answering” (MIQA) task.

How to regularize your regression

ML@CMU 17 Jun 2024

Considering how to tune the norm-based regularization parameters in linear regression.

Beyond the mud: Datasets, benchmarks, and methods for computer vision in off-road racing

ML@CMU 17 Apr 2024

Off-road motorcycle racing poses unique challenges that push the boundaries of what existing computer vision systems can handle

Modeling extremely large images with xT

BAIR blog 08 Apr 2024

Introducing a new framework to model large images on contemporary GPUs while aggregating global context with local details.

The shift from models to compound AI systems

BAIR blog 15 Mar 2024

State-of-the-art AI results are increasingly obtained by compound systems with multiple components, not just monolithic models.

Unlocking the potential of entity-centric knowledge graphs: transforming healthcare and beyond

Christos Theodoropoulos and Leuven.AI Stories 27 Feb 2024

The concept of entity-centric knowledge graphs holds promise in reshaping how we organize, access, and leverage data.

On noisy evaluation in federated hyperparameter tuning

ML@CMU 12 Jan 2024

Our work explores key sources of noise and shows that even small amounts of noise can have a significant impact on tuning methods.

Asymmetric certified robustness via feature-convex neural networks

BAIR blog 14 Dec 2023

We propose the asymmetric certified robustness problem, which requires certified robustness for only one class and reflects real-world adversarial scenarios.

Goal representations for instruction following

BAIR blog 23 Nov 2023

How can we reconcile the ease of specifying tasks through natural language-based approaches with the performance improvements of goal-conditioned learning?

A comprehensive survey on rare event prediction

Chathurangi Shyalika, Ruwan Wickramarachchi and Amit Sheth 22 Nov 2023

We review the rare event prediction literature and highlight open research questions and future directions in the field.

Test-time adaptation with slot-centric models

ML@CMU 25 Sep 2023

Improving out-of-distribution scene decomposition accuracy.

Training diffusion models with reinforcement learning

BAIR blog 15 Sep 2023

We show how diffusion models can be trained on downstream objectives directly using reinforcement learning.

A ‘black box’ AI system has been influencing criminal justice decisions for over two decades – it’s time to open it up

The Conversation 11 Aug 2023

Melissa Hamilton and Pamela Ugwudike investigate the use of automated decision-making systems in courts and prisons.

On the stepwise nature of self-supervised learning

BAIR blog 01 Aug 2023

Presenting a mathematical picture of the training process of large-scale SSL methods.

Navigating to objects in the real world

ML@CMU 24 Jul 2023

Research shows that modular learning is a reliable approach to navigate to objects.

Generating 3D molecular conformers via equivariant coarse-graining and aggregated attention

BAIR blog 14 Jul 2023

Introducing a variational encoder for molecular conformer generation.

GPT-4 + Stable-Diffusion = ?: Enhancing prompt understanding of text-to-image diffusion models with large language models

BAIR blog 26 Jun 2023

Our LLM-grounded model delivers improved prompt understanding in cases including negation, numeracy, and spatial relationships.

On privacy and personalization in federated learning: a retrospective on the US/UK PETs challenge

ML@CMU 05 Jun 2023

Studying the use of differential privacy in personalized, cross-silo federated learning.

TIDEE: An embodied agent that tidies up novel rooms using commonsense priors

ML@CMU 28 Apr 2023

We introduce a new benchmark to test agents in their ability to clean up messy scenes without any human instruction.