In this blog, we will conduct some experiments on Unitree G1. Specifically, using RL + IL and Motion Retargeting from the CMU MoCap dataset, we will attempt to teach Unitree G1 to run, jump, ...
In previous blog posts, we have shown successful applications of CPG-based RL for robot locomotion in both 2D (game) and 3D (world) physical simulation environments. While this approach offers the flexibility to adjust parameters for walking gait on-the-fly, it necessitates heavy reward tuning and prolonged training time. On the other hand, imitation learning from human demonstrations has shown more rapid convergence for natural gait cloning. Here we will try to combine these two techniques and explore the feasibility of using a 2D CPG expert to guide a 3D humanoid robot in learning to walk.
In this blog, we will attempt to adapt CPG + RL, which was successfully applied to the 2D Gym BipedalWalker, to a real 3D bipedal humanoid model: the Unitree H1-2.
In the previous blog, we provided a brief introduction to central pattern generators (CPGs) and how they can be used as quadrupedal locomotion controllers. CPG-based methods offer advantages such as parameterizable gait behavior generation, dynamic motion pattern adjustment on the fly, etc. Therefore it is interesting to see how this method performs when applied to bipedal robots.
Stable-baselines3 (SB3) is a popular library for rapid prototype development of RL algorithm, it ships with many common model-free algorithms out of the box. SB3 requires vectorized environment VecEnv as input interface to its built-in RL algorithms for allowing training on multiple environments, it provides two class DummyVecEnv and SubprocVecEnv. However both DummyVecEnv and SubprocVecEnv has obvious drawbacks for leveraging the capacity of multi-core processor for simulating large number of environments. In this blog we implement a new PoolVecEnv which draws inspiration from ThreadPool/ProcessPool for accelerating large number of environments simulation. Modified Code will be available on Github later
In previous blog, we solved the CartPole-V1 task which tries to balance the pole upright by applying force in the left and right direction on the cart. Although this task is successfully solved with DQN and PPO, we can't control where the cart should be when the pole is balanced. In this blog we try to solve this goal-guided CartPole task.
A central pattern generator (CPG) is a biological neural circuit that produces rhythmic motor patterns, such as walking, breathing, flying, or swimming. CPGs are found in humans and many kinds of animals.
For control tasks such as locomotion of bipedal or quadrupedal robots, the action space is both high-dimensional and continuous, finding methods to directly control all joints is very difficult. It's natural to think controlling each leg motion rather than each joint torque may reduce the complexity of the solution.
CPG assumes each leg's motion follows some rhythmic or quasi-periodic pattern, and divides one full-joints control task into 2(biped)/4(quadruped) leg-joints control tasks, this strategy simplifies the problem and also improves interpretability/controllability.