eGAIT: Multi-Skilled Policy for Energy-efficient Gait Transitions

Efficient and Adaptive Quadrupedal Locomotion Across Complex Terrains

Code Video

Scroll Down for Table of Contents

Table of Contents

Abstract

Achieving adaptive, multi-skilled, and energy-efficient locomotion is vital for advancing the operation of autonomous quadrupedal systems. This study presents eGAIT, a unified multi-skilled policy enabling energy-efficient and stable gait transitions across nine non-monotonic, velocity-optimized gaits, in response to dynamic velocity commands. The framework leverages a hybrid control architecture integrating model-based and learning-based methods to address the entire locomotion pipeline. An MPC-based gait generator produces velocity-optimized trajectories, which are imitated through Proximal Policy Optimization (PPO), driven by an Adversarial Motion Prior (AMP) style reward to train distinct policies for specific velocity ranges. These policies are unified through a Hierarchical Reinforcement Learning (HRL) framework featuring a novel modified Deep Q-Network (eDQN) for real-time velocity-to-policy mapping. Training efficiency is enhanced by an auxiliary selector layer that guides velocity-policy mapping, while a sparsely activated stability reward mechanism ensures smooth gait transitions by incorporating geometric and rotational stability. Extensively validated in simulation and on a Unitree Go1 robot, eGAIT achieves a 100% success rate in velocity-to-policy mapping, a 35% improvement in energy efficiency, and a 31% improvement in both velocity tracking and stability compared to the next best state-of-the-art method.

eGAIT Method Overview

1. AMP Imitated Policies

Below, we showcase the AMP-learned locomotion policies at target velocities within the total range fomr 0.1 to 0.9 m/s.

vel = 0.1 m/s

vel = 0.2 m/s

vel = 0.3 m/s

vel = 0.4 m/s

vel = 0.5 m/s

vel = 0.6 m/s

vel = 0.7 m/s

vel = 0.8 m/s

vel = 0.9 m/s

2. eGAIT Transitions in Simulation

We present visual representations of the three transition experiments described in the paper, demonstrating eGAIT's adaptability under different commanded velocity profiles.

Experiment 1: Ascending and descending velocities

In Experiment 1, we command velocities in ascending and then descending order, from v* = 0.1 to 0.9 m/s and back, using discrete and constant intervals of 2 seconds per transition.

Experiment 2: Interpolation of random velocity samples

In Experiment 2, we interpolate between randomly selected target velocities within the x-direction (vx), using constant time intervals for each transition phase.

Experiment 3: Randomized velocity and timing schedule

In Experiment 3, we command randomly sampled target velocities v* within the defined range, using variable time intervals for each transition. This simulates more unpredictable control inputs via slider (simulation) or joystick (real-world).

3. eGAIT Transitions in Real World

We present Experiment 3 conducted in the real world, following a random velocity schedule. The behavior of eGAIT is shown from different camera viewpoints to highlight the consistency and robustness of gait transitions across varying perspectives.

Viewpoint 1

Viewpoint 2

4. eGAIT Stability Tests

We provide visualizations of eGAIT's behavior under a range of challenging conditions to evaluate its robustness beyond nominal settings. These include environmental disturbances such as uneven terrain, varying friction, and slope gradients, as well as dynamic perturbations like lateral pushes of increasing force. The following videos illustrate how the learned policy maintains stable locomotion across these scenarios, showcasing its generalization and resilience capabilities.

4.1 Terrain Variations

To test eGAIT’s robustness during velocity transitions, we introduce terrain disturbances: height perturbations (h), slope (s), and ground friction (f).

h=0.02m, s=0.002, f=1

h=0.02m, s=0.003, f=0.6

h=0.02m, s=0.002, f=0.4

h=0.03m, s=0.003, f=0.6

DEFAULT: h=0m, s=0, f=1

4.2 Lateral Pushes

We assess the policy's robustness during velocity transitions by applying lateral pushes of 50–110 N to the robot's base for 0.2 seconds, following Experiment 1.

F=50N

F=70N

F=90N

F=110N

5. Ablations

We evaluate eDQN’s design via an ablation study comparing it to a standard DQN and a rule-based Switch baseline under controlled velocity transitions.

5.1 Switch

The Switch baseline uses fixed velocity thresholds to select policies without learning, resulting in poor adaptability and unstable transitions.

5.2 DQN

The DQN baseline removes auxiliary guidance, leading to slower convergence and less consistent policy alignment compared to eDQN.