Achieving adaptive, multi-skilled, and energy-efficient locomotion is vital for advancing the operation of autonomous quadrupedal systems. This study presents eGAIT, a unified multi-skilled policy enabling energy-efficient and stable gait transitions across nine non-monotonic, velocity-optimized gaits, in response to dynamic velocity commands. The framework leverages a hybrid control architecture integrating model-based and learning-based methods to address the entire locomotion pipeline. An MPC-based gait generator produces velocity-optimized trajectories, which are imitated through Proximal Policy Optimization (PPO), driven by an Adversarial Motion Prior (AMP) style reward to train distinct policies for specific velocity ranges. These policies are unified through a Hierarchical Reinforcement Learning (HRL) framework featuring a novel modified Deep Q-Network (eDQN) for real-time velocity-to-policy mapping. Training efficiency is enhanced by an auxiliary selector layer that guides velocity-policy mapping, while a sparsely activated stability reward mechanism ensures smooth gait transitions by incorporating geometric and rotational stability. Extensively validated in simulation and on a Unitree Go1 robot, eGAIT achieves a 100% success rate in velocity-to-policy mapping, a 35% improvement in energy efficiency, and a 31% improvement in both velocity tracking and stability compared to the next best state-of-the-art method.
Below, we showcase the AMP-learned locomotion policies at target velocities within the total range fomr 0.1 to 0.9 m/s.
vel = 0.1 m/s
vel = 0.2 m/s
vel = 0.3 m/s
vel = 0.4 m/s
vel = 0.5 m/s
vel = 0.6 m/s
vel = 0.7 m/s
vel = 0.8 m/s
vel = 0.9 m/s
We present visual representations of the three transition experiments described in the paper, demonstrating eGAIT's adaptability under different commanded velocity profiles.
Experiment 1: Ascending and descending velocities
In Experiment 1, we command velocities in ascending and then descending order, from v* = 0.1 to 0.9 m/s and back, using discrete and constant intervals of 2 seconds per transition.
Experiment 2: Interpolation of random velocity samples
In Experiment 2, we interpolate between randomly selected target velocities within the x-direction (vx), using constant time intervals for each transition phase.
Experiment 3: Randomized velocity and timing schedule
In Experiment 3, we command randomly sampled target velocities v* within the defined range, using variable time intervals for each transition. This simulates more unpredictable control inputs via slider (simulation) or joystick (real-world).
We present Experiment 3 conducted in the real world, following a random velocity schedule. The behavior of eGAIT is shown from different camera viewpoints to highlight the consistency and robustness of gait transitions across varying perspectives.
Viewpoint 1
Viewpoint 2
We provide visualizations of eGAIT's behavior under a range of challenging conditions to evaluate its robustness beyond nominal settings. These include environmental disturbances such as uneven terrain, varying friction, and slope gradients, as well as dynamic perturbations like lateral pushes of increasing force. The following videos illustrate how the learned policy maintains stable locomotion across these scenarios, showcasing its generalization and resilience capabilities.
To test eGAIT’s robustness during velocity transitions, we introduce terrain disturbances: height perturbations (h), slope (s), and ground friction (f).
h=0.02m, s=0.002, f=1
h=0.02m, s=0.003, f=0.6
h=0.02m, s=0.002, f=0.4
h=0.03m, s=0.003, f=0.6
DEFAULT: h=0m, s=0, f=1
We assess the policy's robustness during velocity transitions by applying lateral pushes of 50–110 N to the robot's base for 0.2 seconds, following Experiment 1.
F=50N
F=70N
F=90N
F=110N
We evaluate eDQN’s design via an ablation study comparing it to a standard DQN and a rule-based Switch baseline under controlled velocity transitions.
The Switch baseline uses fixed velocity thresholds to select policies without learning, resulting in poor adaptability and unstable transitions.
The DQN baseline removes auxiliary guidance, leading to slower convergence and less consistent policy alignment compared to eDQN.