BipedalWalker Demo: Experiments with CPG-based RL

Posted on 2024-12-06 :: Tags: CPG, RL, DRL, BipedalWalker, Bipedal, Biped

Background

In the previous blog, we provided a brief introduction to central pattern generators (CPGs) and how they can be used as quadrupedal locomotion controllers. CPG-based methods offer advantages such as parameterizable gait behavior generation, dynamic motion pattern adjustment on the fly, etc. Therefore it is interesting to see how this method performs when applied to bipedal robots.

Introduction

There are some differences that should be noted when porting CPG from quadrupedal robots to bipedal robots, i.e.,

Balance Maintenance: quadrupeds are generally easier to maintain balance than bipeds because of their wider base of support, lower center of gravity, function redundancy from four legs, etc.
CPG Model Parameters: For 3D quadrupedal robots, there are 12-DOF control variables from all joints, corresponding to 12 CPG parameters for 3D movement. However for 2D bipedal robots, there are only 4 joints to control for locomotion, is it enough to use just 4 CPG parameters to accomplish self-locomotion tasks?
Training Environment: many CPG-based algorithms are trained on flat terrain since uneven ground has big impact on CPG's intrinsic rhythm and makes the training process more unstable. But here we choose Gym BipedalWalker which bases on Box2d physical engine, because it is more engaging and can meanwhile help to reveal the CPG's performance in difficult scenarios.

The system is mainly composed of three components: the CPG module, the RL(PPO) pipeline, and the Gym Environment. The overall architecture diagram is as follows,

Methods

In this blog a modified CPG model used for BipedWalker is as follows:

$$ \left\{ \begin{align} \ddot{r}_t &= a \left( \frac{a}{4} \left( \mu_t - r_t \right) - \dot{r}_t \right) \\ \dot{\theta}_t &= \omega_t + \omega^b \\ \end{align} \right. $$

where $r_t$ is the current amplitude of the oscillator, $\theta_t$ is the current phase of the oscillator, $\mu_t$ is the intrinsic amplitude, $\omega_t$ is the adaptive frequency of environment, $\omega_b$ is the base movement frequency for all limbs, $a$ is a positive constant controlling the convergence speed of amplitude.

Then the following transformation is used to convert each leg's CPG outputs to its corresponding foot's Cartesian position:

$$ \left\{ \begin{align} p_{x,t} &= -d_{\text{step}} (r_t - 1) \cos (\theta_t) \\ p_{z,t} &= \begin{cases} -h + g_c \sin(\theta_t) & \text{if} \sin(\theta_t) \ge 0 \\ -h + g_p \sin(\theta_t) & \text{if} \sin(\theta_t) \lt 0 \\ \end{cases} \\ \end{align} \right. $$

Finally a PD controller is used to output torque for each joint, where the required targets of PD are obtained via inverse kinematics from foot Cartesian positions to joint angles:

$$ \left\{ \begin{align} l_3 &= \sqrt{p_x^2 + p_y^2} \\ \theta_1 &= \arctan\frac{p_x}{p_y} + \arccos\frac{l_1^2 + l_3^2 - l_2^2}{2l_1l_3} \\ \theta_2 &= -\arccos\frac{l_3^2 - l_1^2 - l_2^2}{2l_1l_2} \\ \end{align} \right. $$

if $l_1 = l_2 \triangleq l$, like in gym BipedalWalker environment, above equations can be reduced/simplified to:

$$ \left\{ \begin{align} l_3 &= \sqrt{p_x^2 + p_y^2} \\ \theta_1 &= \arctan\frac{p_x}{p_y} + \arccos\frac{l_3}{2l} \\ \theta_2 &= -2 \arccos\frac{l_3}{2l} \\ \end{align} \right. $$

CPG Patterns

Here we illustrate some CPG patterns for the Gym BipedalWalker with different parameters, generated using different parameter combinations of frequency, hull height, step length, and ground clearance.

Swing Frequency vs Hull Height