Posted on :: Tags: , , , , , , , , , , ,

Introduction

In previous blog posts, we have shown successful applications of CPG-based RL for robot locomotion in both 2D (game) and 3D (world) physical simulation environments. While this approach offers the flexibility to adjust parameters for walking gait on-the-fly, it necessitates heavy reward tuning and prolonged training time. On the other hand, imitation learning from human demonstrations has shown more rapid convergence for natural gait cloning. Here we will try to combine these two techniques and explore the feasibility of using a 2D CPG expert to guide a 3D humanoid robot in learning to walk.

BipedalWalker: Finally has learned to walk~

Unitree/H1-2: Done copying!

Methods

We evaluate the idea of using BipedalWalker's 2D CPG demonstrations to train Unitree H1-2's 3D locomotion in two stages:

  1. Same Domain: train an IL+RL policy using Unitree H1-2 3D CPG expert demonstration data

    Here, tens of demonstration datasets of Unitree H1-2 were collected using a previously trained 3D CPG expert policy. Each dataset contains a full episode trajectory with a maximum length equal to the predefined timeout. To make it easier to compare the result using the 2D BipedalWalker dataset, only a forward speed command was given to the robot, and both lateral speed and heading commands were set to zero.

  2. Cross Domain: train an IL+RL policy using Gym BipedalWalker 2D CPG expert demonstration data

    Two distinct policies were used for generating demonstration data, one is the previously trained 2D CPG expert policy, while the other is Gym BipedalWalker's built-in heuristic policy. There are 3 key domain discrepancies when transferring locomotion experiences:

    • Robot Model: Gym BipedalWalker (leveraging the Box2D physics engine) versus Unitree H1-2 (powered by IsaacGym physics engine)
    • World Dimension: 2D game world versus 3D physical world
    • Terrain: uneven ground versus even surfaces

Experiments

Imitation from Same Model's CPG Expert Demonstratiton

  • Dataset Details

    A pre-trained CPG+RL expert policy is adopted for Unitree H1-2 to walk, and two kinds of command parameters are contained in the policy's inputs, including environment commands and CPG model parameters. Although all parameters support on-the-fly adjustment, here they are randomly set at the beginning and remain fixed for each episode. Specifically,

    CMDVALUE
    ENV
    $v_x$$\mathcal{U}(0.0, 1.0)$ m/s
    $v_y$$0.0$ m/s
    $\omega_z$$0.0$ rad/s
    CPG
    $ht$$1.85$
    $stp$$0.4$
    $g_c$$0.2$
    $g_p$$0.01$
  • Evaluations

Inspiration from Other Model's CPG Pattern Extraction

  • Dataset Samples

  • Evaluations

    forward only movement

    lifting to omni-movement


Comments