ΑΙhub.org
 

How can robots acquire skills through interactions with the physical world? An interview with Jiaheng Hu


by
12 February 2026



share this:

One of the key challenges in building robots for household or industrial settings is the need to master the control of high-degree-of-freedom systems such as mobile manipulators. Reinforcement learning has been a promising avenue for acquiring robot control policies, however, scaling to complex systems has proved tricky. In their work SLAC: Simulation-Pretrained Latent Action Space for Whole-Body Real-World RL, Jiaheng Hu, Peter Stone and Roberto Martín-Martín introduce a method that renders real-world reinforcement learning feasible for complex embodiments. We caught up with Jiaheng to find out more.

What is the topic of the research in your paper and why is it an interesting area for study?

This paper is about how robots (in particular, household robots like mobile manipulators) can autonomously acquire skills via interacting with the physical world (i.e. real-world reinforcement learning). Reinforcement learning (RL) is a general learning framework for learning from trial-and-error interaction with an environment, and has huge potential in allowing robots to learn tasks without humans hand-engineering the solution. RL for robotics is a very exciting field, as it can open possibilities for robots to self-improve in a scalable way, towards the creation of general-purpose household robots that can assist people in our everyday lives.

What were some of the issues with previous methods that your paper was trying to address?

Previously, most of the successful applications of RL to robotics were done by training entirely in simulation, then deploying the policy in the real-world directly (i.e. zero-shot sim2real). However, such a method has big limitations: on one hand, it is not very scalable, as you need to create task-specific, high-fidelity simulation environments that highly match the real-world environment that you want to deploy the robot in, and this can often take days or months for each and every task. On the other hand, some tasks are actually very hard to simulate, as they involve deformable objects and contact-rich interactions (for example, pouring water, folding clothes, wiping whiteboard). For these tasks, the simulation is often quite different from the real world. This is where real-world RL comes into play: if we can allow a robot to learn by directly interacting with the physical world, we don’t need a simulator anymore. However, while several attempts have been made towards realizing real-world RL, it is actually a very hard problem since: 1. Sample-inefficiency: RL requires a lot of samples (i.e. interaction with the environment) to learn good behavior, which is often impossible to collect in large quantities in the real-world. 2. Safety Issues: RL requires exploration, and random exploration in the real-world is often very very dangerous. The robot can break itself and will never be able to recover from that.

Could you tell us about the method (SLAC) that you’ve introduced?

So, creating high-fidelity simulations is very hard, and directly learning in the real-world is also really hard. What should we do? The key idea of SLAC is that we can use a low-fidelity simulation environment to assist subsequent real-world RL. Specifically, SLAC implements this idea in a two-step process: in the first step, SLAC learns a latent action space in simulation via unsupervised reinforcement learning. Unsupervised RL is a technique that allows the robot to explore a given environment and learn task-agnostic behaviors. In SLAC, we design a special unsupervised RL objective that encourages these behaviors to be safe and structured.

In the second step, we treat these learned behaviors as the new action space of the robot, where the robot does real-world RL for downstream tasks such as wiping whiteboards by making decisions in this new action space. Importantly, this method allow us to circumvent the two biggest problem of real-world RL: we don’t have to worry about safety issues since the new action space is pretrained to be always safe; and we can learn in a sample-efficient way because our new action space is trained to be very structured.

The robot carrying out the task of wiping a whiteboard.

How did you go about testing and evaluating your method, and what were some of the key results?

We test our methods on a real Tiago robot – a high degrees-of-freedom, bi-manual mobile manipulation, on a series of very challenging real-world tasks, including wiping a large whiteboard, cleaning a table, and sweeping trash into a bag. These tasks are challenging from three aspects: 1. They are visuo-motor tasks that require processing of high-dimensional image information. 2. They require the whole-body motion of the robot (i.e. controlling many degrees-of-freedom at the same time), and 3. They are contact-rich, which makes it hard to simulate accurately. On all of these tasks, our method allows us to learn high-performance policies (>80% success rate) within an hour of real-world interactions. By comparison, previous methods simply cannot solve the task, and often risk breaking the robot. So to summarize, previously it was simply not possible to solve these tasks via real-world RL, and our method has made it possible.

What are your plans for future work?

I think there is still a lot more to do at the intersection of RL and robotics. My eventual goal is to create truly self-improving robots that can learn entirely by themselves without any human involvement. More recently, I’ve been interested in how we can leverage foundation models such as vision-language models (VLMs) and vision-language-action models (VLAs) to further automate the self-improvement loop.

About Jiaheng

Jiaheng Hu is a 4th-year PhD student at UT-Austin, co-advised by Prof. Peter Stone and Prof. Roberto Martín-Martín. His research interest is in Robot Learning and Reinforcement Learning, with the long-term goal of developing self-improving robots that can learn and adapt autonomously in unstructured environments. Jiaheng’s work has been published at top-tier Robotics and ML venues, including CoRL, NeurIPS, RSS, and ICRA, and has earned multiple best paper nominations and awards. During his PhD, he interned at Google DeepMind and Ai2, and is a recipient of the Two Sigma PhD Fellowship.

Read the work in full

SLAC: Simulation-Pretrained Latent Action Space for Whole-Body Real-World RL, Jiaheng Hu, Peter Stone, Roberto Martín-Martín.



tags: ,


Lucy Smith is Senior Managing Editor for AIhub.
Lucy Smith is Senior Managing Editor for AIhub.

            AIhub is supported by:



Subscribe to AIhub newsletter on substack



Related posts :

RWDS Big Questions: how do we highlight the role of statistics in AI?

  25 Mar 2026
Next in our series, the panel explores the statistical underpinning of AI.

A history of RoboCup with Manuela Veloso

  24 Mar 2026
Find out how RoboCup got started and how the competition has evolved, from one of the co-founders.

Information-driven design of imaging systems

  23 Mar 2026
Framework that enables direct evaluation and optimization of imaging systems based on their information content.

Machine learning framework to predict global imperilment status of freshwater fish

  20 Mar 2026
“With our model, decision makers can deploy resources in advance before a species becomes imperiled.”

Interview with AAAI Fellow Yan Liu: machine learning for time series

  19 Mar 2026
Hear from 2026 AAAI Fellow Yan Liu about her research into time series, the associated applications, and the promise of physics-informed models.

A principled approach for data bias mitigation

  18 Mar 2026
Find out more about work presented at AIES 2025 which proposes a new way to measure data bias, along with a mitigation algorithm with mathematical guarantees.

An AI image generator for non-English speakers

  17 Mar 2026
"Translations lose the nuances of language and culture, because many words lack good English equivalents."

AI and Theory of Mind: an interview with Nitay Alon

  16 Mar 2026
Find out more about how Theory of Mind plays out in deceptive environments, multi-agents systems, the interdisciplinary nature of this field, when to use Theory of Mind, and when not to, and more.



AIhub is supported by:







Subscribe to AIhub newsletter on substack




 















©2026.02 - Association for the Understanding of Artificial Intelligence