2025-08-09

Text Conditioned Physics-based Character Control

Introduction

In this blog, we try to train a text-conditioned robot control policy. It extends a naive adversarial imitation policy by supporting user text instructions to generate motions with different styles. Specifically, we integrate a BERT language model into an adversarial imitation RL pipeline.

…

Read more ⟶

2025-06-09

Humanoid Demo: Experiments with Unitree G1

Introduction

In this blog, we will conduct some experiments on Unitree G1. Specifically, using RL + IL and Motion Retargeting from the CMU MoCap dataset, we will attempt to teach Unitree G1 to run, jump, ...

…

Read more ⟶

2025-03-01

Humanoid Demo: Game2D-to-Sim3D Cross Domain Skill Adaptation From CPG Expert Demonstrations

Introduction

In previous blog posts, we have shown successful applications of CPG-based RL for robot locomotion in both 2D (game) and 3D (world) physical simulation environments. While this approach offers the flexibility to adjust parameters for walking gait on-the-fly, it necessitates heavy reward tuning and prolonged training time. On the other hand, imitation learning from human demonstrations has shown more rapid convergence for natural gait cloning. Here we will try to combine these two techniques and explore the feasibility of using a 2D CPG expert to guide a 3D humanoid robot in learning to walk.

BipedalWalker: Finally has learned to walk~

Unitree/H1-2: Done copying!

…

Read more ⟶

2025-01-26

Humanoid Demo: Porting CPG to Unitree H1

Introduction

In this blog, we will attempt to adapt CPG + RL, which was successfully applied to the 2D Gym BipedalWalker, to a real 3D bipedal humanoid model: the Unitree H1-2.

…

Read more ⟶

2024-12-06

BipedalWalker Demo: Experiments with CPG-based RL

Background

In the previous blog, we provided a brief introduction to central pattern generators (CPGs) and how they can be used as quadrupedal locomotion controllers. CPG-based methods offer advantages such as parameterizable gait behavior generation, dynamic motion pattern adjustment on the fly, etc. Therefore it is interesting to see how this method performs when applied to bipedal robots.

…

Read more ⟶

2024-11-30

Accelerating Simulation of Stable Baselines3's VecEnv in Multi-Core Processor

Background

Stable-baselines3 (SB3) is a popular library for rapid prototype development of RL algorithm, it ships with many common model-free algorithms out of the box. SB3 requires vectorized environment VecEnv as input interface to its built-in RL algorithms for allowing training on multiple environments, it provides two class DummyVecEnv and SubprocVecEnv. However both DummyVecEnv and SubprocVecEnv has obvious drawbacks for leveraging the capacity of multi-core processor for simulating large number of environments. In this blog we implement a new PoolVecEnv which draws inspiration from ThreadPool/ProcessPool for accelerating large number of environments simulation. Modified Code will be available on Github later

…

Read more ⟶

2024-11-11

BipedalWalker Demo: Simple Experiments with RL

…

Read more ⟶

2024-10-30

CartPole Demo: Task Extention With Goal-Guided Control

Background

In previous blog, we solved the CartPole-V1 task which tries to balance the pole upright by applying force in the left and right direction on the cart. Although this task is successfully solved with DQN and PPO, we can't control where the cart should be when the pole is balanced. In this blog we try to solve this goal-guided CartPole task.

…

Read more ⟶

2024-10-29

Quadruped Demo: CPG Introduction

Background

A central pattern generator (CPG) is a biological neural circuit that produces rhythmic motor patterns, such as walking, breathing, flying, or swimming. CPGs are found in humans and many kinds of animals.

For control tasks such as locomotion of bipedal or quadrupedal robots, the action space is both high-dimensional and continuous, finding methods to directly control all joints is very difficult. It's natural to think controlling each leg motion rather than each joint torque may reduce the complexity of the solution.

CPG assumes each leg's motion follows some rhythmic or quasi-periodic pattern, and divides one full-joints control task into 2(biped)/4(quadruped) leg-joints control tasks, this strategy simplifies the problem and also improves interpretability/controllability.

BD Spot-Mini Robot Dog

…

Read more ⟶

2024-10-26

CartPole Demo: Simple Experiments with RL

…

Read more ⟶