ΑΙhub.org
 

How can robots acquire skills through interactions with the physical world? An interview with Jiaheng Hu


by
12 February 2026



share this:

One of the key challenges in building robots for household or industrial settings is the need to master the control of high-degree-of-freedom systems such as mobile manipulators. Reinforcement learning has been a promising avenue for acquiring robot control policies, however, scaling to complex systems has proved tricky. In their work SLAC: Simulation-Pretrained Latent Action Space for Whole-Body Real-World RL, Jiaheng Hu, Peter Stone and Roberto Martín-Martín introduce a method that renders real-world reinforcement learning feasible for complex embodiments. We caught up with Jiaheng to find out more.

What is the topic of the research in your paper and why is it an interesting area for study?

This paper is about how robots (in particular, household robots like mobile manipulators) can autonomously acquire skills via interacting with the physical world (i.e. real-world reinforcement learning). Reinforcement learning (RL) is a general learning framework for learning from trial-and-error interaction with an environment, and has huge potential in allowing robots to learn tasks without humans hand-engineering the solution. RL for robotics is a very exciting field, as it can open possibilities for robots to self-improve in a scalable way, towards the creation of general-purpose household robots that can assist people in our everyday lives.

What were some of the issues with previous methods that your paper was trying to address?

Previously, most of the successful applications of RL to robotics were done by training entirely in simulation, then deploying the policy in the real-world directly (i.e. zero-shot sim2real). However, such a method has big limitations: on one hand, it is not very scalable, as you need to create task-specific, high-fidelity simulation environments that highly match the real-world environment that you want to deploy the robot in, and this can often take days or months for each and every task. On the other hand, some tasks are actually very hard to simulate, as they involve deformable objects and contact-rich interactions (for example, pouring water, folding clothes, wiping whiteboard). For these tasks, the simulation is often quite different from the real world. This is where real-world RL comes into play: if we can allow a robot to learn by directly interacting with the physical world, we don’t need a simulator anymore. However, while several attempts have been made towards realizing real-world RL, it is actually a very hard problem since: 1. Sample-inefficiency: RL requires a lot of samples (i.e. interaction with the environment) to learn good behavior, which is often impossible to collect in large quantities in the real-world. 2. Safety Issues: RL requires exploration, and random exploration in the real-world is often very very dangerous. The robot can break itself and will never be able to recover from that.

Could you tell us about the method (SLAC) that you’ve introduced?

So, creating high-fidelity simulations is very hard, and directly learning in the real-world is also really hard. What should we do? The key idea of SLAC is that we can use a low-fidelity simulation environment to assist subsequent real-world RL. Specifically, SLAC implements this idea in a two-step process: in the first step, SLAC learns a latent action space in simulation via unsupervised reinforcement learning. Unsupervised RL is a technique that allows the robot to explore a given environment and learn task-agnostic behaviors. In SLAC, we design a special unsupervised RL objective that encourages these behaviors to be safe and structured.

In the second step, we treat these learned behaviors as the new action space of the robot, where the robot does real-world RL for downstream tasks such as wiping whiteboards by making decisions in this new action space. Importantly, this method allow us to circumvent the two biggest problem of real-world RL: we don’t have to worry about safety issues since the new action space is pretrained to be always safe; and we can learn in a sample-efficient way because our new action space is trained to be very structured.

The robot carrying out the task of wiping a whiteboard.

How did you go about testing and evaluating your method, and what were some of the key results?

We test our methods on a real Tiago robot – a high degrees-of-freedom, bi-manual mobile manipulation, on a series of very challenging real-world tasks, including wiping a large whiteboard, cleaning a table, and sweeping trash into a bag. These tasks are challenging from three aspects: 1. They are visuo-motor tasks that require processing of high-dimensional image information. 2. They require the whole-body motion of the robot (i.e. controlling many degrees-of-freedom at the same time), and 3. They are contact-rich, which makes it hard to simulate accurately. On all of these tasks, our method allows us to learn high-performance policies (>80% success rate) within an hour of real-world interactions. By comparison, previous methods simply cannot solve the task, and often risk breaking the robot. So to summarize, previously it was simply not possible to solve these tasks via real-world RL, and our method has made it possible.

What are your plans for future work?

I think there is still a lot more to do at the intersection of RL and robotics. My eventual goal is to create truly self-improving robots that can learn entirely by themselves without any human involvement. More recently, I’ve been interested in how we can leverage foundation models such as vision-language models (VLMs) and vision-language-action models (VLAs) to further automate the self-improvement loop.

About Jiaheng

Jiaheng Hu is a 4th-year PhD student at UT-Austin, co-advised by Prof. Peter Stone and Prof. Roberto Martín-Martín. His research interest is in Robot Learning and Reinforcement Learning, with the long-term goal of developing self-improving robots that can learn and adapt autonomously in unstructured environments. Jiaheng’s work has been published at top-tier Robotics and ML venues, including CoRL, NeurIPS, RSS, and ICRA, and has earned multiple best paper nominations and awards. During his PhD, he interned at Google DeepMind and Ai2, and is a recipient of the Two Sigma PhD Fellowship.

Read the work in full

SLAC: Simulation-Pretrained Latent Action Space for Whole-Body Real-World RL, Jiaheng Hu, Peter Stone, Roberto Martín-Martín.



tags: ,


Lucy Smith is Senior Managing Editor for AIhub.
Lucy Smith is Senior Managing Editor for AIhub.




            AIhub is supported by:



Related posts :

From Visual Question Answering to multimodal learning: an interview with Aishwarya Agrawal

and   11 Feb 2026
We hear from Aishwarya about research that received a 2019 AAAI / ACM SIGAI Doctoral Dissertation Award honourable mention.

Governing the rise of interactive AI will require behavioral insights

  10 Feb 2026
Yulu Pi writes about her work that was presented at the conference on AI, ethics and society (AIES 2025).

AI is coming to Olympic judging: what makes it a game changer?

  09 Feb 2026
Research suggests that trust, legitimacy, and cultural values may matter just as much as technical accuracy.

Sven Koenig wins the 2026 ACM/SIGAI Autonomous Agents Research Award

  06 Feb 2026
Sven honoured for his work on AI planning and search.

Congratulations to the #AAAI2026 award winners

  05 Feb 2026
Find out who has won the prestigious 2026 awards for their contributions to the field.

Forthcoming machine learning and AI seminars: February 2026 edition

  04 Feb 2026
A list of free-to-attend AI-related seminars that are scheduled to take place between 4 February and 31 March 2026.

#AAAI2026 social media round up: part 2

  03 Feb 2026
Catch up on the action from the second half of the conference.

Interview with Zijian Zhao: Labor management in transportation gig systems through reinforcement learning

  02 Feb 2026
In the second of our interviews with the 2026 AAAI Doctoral Consortium cohort, we hear from Zijian Zhao.


AIhub is supported by:







 













©2026.01 - Association for the Understanding of Artificial Intelligence