ΑΙhub.org
 

How can robots acquire skills through interactions with the physical world? An interview with Jiaheng Hu


by
12 February 2026



share this:

One of the key challenges in building robots for household or industrial settings is the need to master the control of high-degree-of-freedom systems such as mobile manipulators. Reinforcement learning has been a promising avenue for acquiring robot control policies, however, scaling to complex systems has proved tricky. In their work SLAC: Simulation-Pretrained Latent Action Space for Whole-Body Real-World RL, Jiaheng Hu, Peter Stone and Roberto Martín-Martín introduce a method that renders real-world reinforcement learning feasible for complex embodiments. We caught up with Jiaheng to find out more.

What is the topic of the research in your paper and why is it an interesting area for study?

This paper is about how robots (in particular, household robots like mobile manipulators) can autonomously acquire skills via interacting with the physical world (i.e. real-world reinforcement learning). Reinforcement learning (RL) is a general learning framework for learning from trial-and-error interaction with an environment, and has huge potential in allowing robots to learn tasks without humans hand-engineering the solution. RL for robotics is a very exciting field, as it can open possibilities for robots to self-improve in a scalable way, towards the creation of general-purpose household robots that can assist people in our everyday lives.

What were some of the issues with previous methods that your paper was trying to address?

Previously, most of the successful applications of RL to robotics were done by training entirely in simulation, then deploying the policy in the real-world directly (i.e. zero-shot sim2real). However, such a method has big limitations: on one hand, it is not very scalable, as you need to create task-specific, high-fidelity simulation environments that highly match the real-world environment that you want to deploy the robot in, and this can often take days or months for each and every task. On the other hand, some tasks are actually very hard to simulate, as they involve deformable objects and contact-rich interactions (for example, pouring water, folding clothes, wiping whiteboard). For these tasks, the simulation is often quite different from the real world. This is where real-world RL comes into play: if we can allow a robot to learn by directly interacting with the physical world, we don’t need a simulator anymore. However, while several attempts have been made towards realizing real-world RL, it is actually a very hard problem since: 1. Sample-inefficiency: RL requires a lot of samples (i.e. interaction with the environment) to learn good behavior, which is often impossible to collect in large quantities in the real-world. 2. Safety Issues: RL requires exploration, and random exploration in the real-world is often very very dangerous. The robot can break itself and will never be able to recover from that.

Could you tell us about the method (SLAC) that you’ve introduced?

So, creating high-fidelity simulations is very hard, and directly learning in the real-world is also really hard. What should we do? The key idea of SLAC is that we can use a low-fidelity simulation environment to assist subsequent real-world RL. Specifically, SLAC implements this idea in a two-step process: in the first step, SLAC learns a latent action space in simulation via unsupervised reinforcement learning. Unsupervised RL is a technique that allows the robot to explore a given environment and learn task-agnostic behaviors. In SLAC, we design a special unsupervised RL objective that encourages these behaviors to be safe and structured.

In the second step, we treat these learned behaviors as the new action space of the robot, where the robot does real-world RL for downstream tasks such as wiping whiteboards by making decisions in this new action space. Importantly, this method allow us to circumvent the two biggest problem of real-world RL: we don’t have to worry about safety issues since the new action space is pretrained to be always safe; and we can learn in a sample-efficient way because our new action space is trained to be very structured.

The robot carrying out the task of wiping a whiteboard.

How did you go about testing and evaluating your method, and what were some of the key results?

We test our methods on a real Tiago robot – a high degrees-of-freedom, bi-manual mobile manipulation, on a series of very challenging real-world tasks, including wiping a large whiteboard, cleaning a table, and sweeping trash into a bag. These tasks are challenging from three aspects: 1. They are visuo-motor tasks that require processing of high-dimensional image information. 2. They require the whole-body motion of the robot (i.e. controlling many degrees-of-freedom at the same time), and 3. They are contact-rich, which makes it hard to simulate accurately. On all of these tasks, our method allows us to learn high-performance policies (>80% success rate) within an hour of real-world interactions. By comparison, previous methods simply cannot solve the task, and often risk breaking the robot. So to summarize, previously it was simply not possible to solve these tasks via real-world RL, and our method has made it possible.

What are your plans for future work?

I think there is still a lot more to do at the intersection of RL and robotics. My eventual goal is to create truly self-improving robots that can learn entirely by themselves without any human involvement. More recently, I’ve been interested in how we can leverage foundation models such as vision-language models (VLMs) and vision-language-action models (VLAs) to further automate the self-improvement loop.

About Jiaheng

Jiaheng Hu is a 4th-year PhD student at UT-Austin, co-advised by Prof. Peter Stone and Prof. Roberto Martín-Martín. His research interest is in Robot Learning and Reinforcement Learning, with the long-term goal of developing self-improving robots that can learn and adapt autonomously in unstructured environments. Jiaheng’s work has been published at top-tier Robotics and ML venues, including CoRL, NeurIPS, RSS, and ICRA, and has earned multiple best paper nominations and awards. During his PhD, he interned at Google DeepMind and Ai2, and is a recipient of the Two Sigma PhD Fellowship.

Read the work in full

SLAC: Simulation-Pretrained Latent Action Space for Whole-Body Real-World RL, Jiaheng Hu, Peter Stone, Roberto Martín-Martín.



tags: ,


Lucy Smith is Senior Managing Editor for AIhub.
Lucy Smith is Senior Managing Editor for AIhub.

            AIhub is supported by:



Subscribe to AIhub newsletter on substack



Related posts :

Top AI ethics and policy issues of 2025 and what to expect in 2026

, and   04 Mar 2026
In the latest issue of AI Matters, a publication of ACM SIGAI, Larry Medsker summarised the year in AI ethics and policy, and looked ahead to 2026.

The greatest risk of AI in higher education isn’t cheating – it’s the erosion of learning itself

  03 Mar 2026
Will AI hollow out the pipeline of students, researchers and faculty that is the basis of today’s universities?

Forthcoming machine learning and AI seminars: March 2026 edition

  02 Mar 2026
A list of free-to-attend AI-related seminars that are scheduled to take place between 2 March and 30 April 2026.
monthly digest

AIhub monthly digest: February 2026 – collective decision making, multi-modal learning, and governing the rise of interactive AI

  27 Feb 2026
Welcome to our monthly digest, where you can catch up with AI research, events and news from the month past.

The Good Robot podcast: the role of designers in AI ethics with Tomasz Hollanek

  26 Feb 2026
In this episode, Tomasz argues that design is central to AI ethics and explores the role designers should play in shaping ethical AI systems.

Reinforcement learning applied to autonomous vehicles: an interview with Oliver Chang

  25 Feb 2026
In the third of our interviews with the 2026 AAAI Doctoral Consortium cohort, we hear from Oliver Chang.

The Machine Ethics podcast: moral agents with Jen Semler

In this episode, Ben and Jen Semler talk about what makes a moral agent, the point of moral agents, philosopher and engineer collaborations, and more.

Extending the reward structure in reinforcement learning: an interview with Tanmay Ambadkar

  23 Feb 2026
Find out more about Tanmay's research on RL frameworks, the latest in our series meeting the AAAI Doctoral Consortium participants.



AIhub is supported by:







Subscribe to AIhub newsletter on substack




 















©2026.02 - Association for the Understanding of Artificial Intelligence