Holosoma FastSAC Locomotion Policy Architecture
Model: fastsac_g1_29dof.onnx
Source: Holosoma Whole-Body Locomotion
Repository: nepyope/holosoma_locomotion
Framework: PyTorch 2.4.1 β ONNX (Opset 13)
Total Parameters: 221,727
Table of Contents
- Overview
- Input Space (Observation)
- Output Space (Action)
- Network Architecture
- Forward Pass Data Flow
- PD Gains
- Comparison with GR00T
Overview
The Holosoma FastSAC policy is a single unified actor network trained using the Fast Soft Actor-Critic (SAC) algorithm for bipedal whole-body locomotion. Unlike GR00T's dual-policy system (balance + walk), Holosoma uses one network to handle all locomotion states.
Key Features
- Single frame input: 100D observation (no temporal stacking)
- Full body control: All 29 DOF controlled simultaneously
- Observation normalization: Running mean/std baked into model
- Per-joint action scaling: Action scale factors embedded in ONNX
- LayerNorm: Used instead of BatchNorm for stability
- Gait phase encoding: Sin/cos phase signals for periodic locomotion
Input Space (Observation)
Observation Vector (100D)
| Index | Dim | Content | Scale |
|---|---|---|---|
| [0:29] | 29 | last_action (unscaled) | 1.0 |
| [29:32] | 3 | angular_velocity (IMU gyro) | 0.25 |
| [32] | 1 | yaw_velocity_command | 1.0 |
| [33:35] | 2 | linear_velocity_command (x, y) | 1.0 |
| [35:37] | 2 | gait_phase_cos | 1.0 |
| [37:66] | 29 | joint_positions (relative to default) | 1.0 |
| [66:95] | 29 | joint_velocities | 0.05 |
| [95:98] | 3 | gravity_orientation | 1.0 |
| [98:100] | 2 | gait_phase_sin | 1.0 |
Total: 100D
Observation Scaling Constants
ANG_VEL_SCALE = 0.25 # Angular velocity scaling
DOF_POS_SCALE = 1.0 # Joint position scaling
DOF_VEL_SCALE = 0.05 # Joint velocity scaling
Input Normalization
The model includes a baked-in observation normalizer applied internally:
# Applied inside ONNX graph
obs_normalized = (obs - obs_mean) / obs_std
# obs_mean: shape (1, 100), range [-0.99, 0.71]
# obs_std: shape (1, 100), range [0.04, 1.40]
Default Joint Positions (29D Standing Pose)
DEFAULT_ANGLES = np.zeros(29, dtype=np.float32)
DEFAULT_ANGLES[[0, 6]] = -0.312 # hip pitch (left, right)
DEFAULT_ANGLES[[3, 9]] = 0.669 # knee
DEFAULT_ANGLES[[4, 10]] = -0.363 # ankle pitch
DEFAULT_ANGLES[[15, 22]] = 0.2 # shoulder pitch
DEFAULT_ANGLES[16] = 0.2 # left shoulder roll
DEFAULT_ANGLES[23] = -0.2 # right shoulder roll
DEFAULT_ANGLES[[18, 25]] = 0.6 # elbow
Gait Phase
GAIT_PERIOD = 1.0 # seconds
CONTROL_FREQ = 50 # Hz
phase_dt = 2 * Ο / (CONTROL_FREQ * GAIT_PERIOD)
# Two-phase system (left/right legs offset by Ο)
phase = [phase_left, phase_right]
# When standing still (no command)
if cmd_magnitude < 0.01:
phase = [Ο, Ο] # Both legs synchronized (standing)
Output Space (Action)
Action Vector (29D)
All 29 joints are controlled:
| Index | Joint Group | DOF |
|---|---|---|
| [0:6] | Left leg (hip_pitch, hip_roll, hip_yaw, knee, ankle_pitch, ankle_roll) | 6 |
| [6:12] | Right leg | 6 |
| [12:15] | Waist (yaw, roll, pitch) | 3 |
| [15:22] | Left arm (shoulder_pitch/roll/yaw, elbow, wrist_roll/pitch/yaw) | 7 |
| [22:29] | Right arm | 7 |
Total: 29D
Action Scaling & Application
ACTION_SCALE = 0.25 # External scaling applied in control loop
# Target joint positions
target_pos = DEFAULT_ANGLES + action * ACTION_SCALE
Command Ranges
| Command | Range |
|---|---|
| Linear velocity X | [-1.0, 1.0] m/s |
| Linear velocity Y | [-1.0, 1.0] m/s |
| Angular velocity (yaw) | [-1.0, 1.0] rad/s |
| Heading | [-3.14, 3.14] rad |
Network Architecture
Overview Diagram
Input (100D)
β
βΌ
βββββββββββββββββββββββββββ
β Observation Normalizer β (mean/std normalization)
βββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββ
β Linear(100 β 512) β 51,200 params
β LayerNorm(512) β 1,024 params
β ELU() β
βββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββ
β Linear(512 β 256) β 131,072 params
β LayerNorm(256) β 512 params
β ELU() β
βββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββ
β Linear(256 β 128) β 32,768 params
β LayerNorm(128) β 256 params
β ELU() β
βββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββ
β Linear(128 β 29) β 3,712 params
β Tanh() β
βββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββ
β Action Scaling β action * scale + bias
βββββββββββββββββββββββββββ
β
βΌ
Output (29D)
Layer Details
Layer 1: Linear(100 β 512) + LayerNorm + ELU
- Input: 100D normalized observation
- Output: 512D features
- Parameters: 51,200 (weights) + 512 (bias) + 1,024 (LayerNorm) = 52,736
Layer 2: Linear(512 β 256) + LayerNorm + ELU
- Input: 512D features
- Output: 256D features
- Parameters: 131,072 (weights) + 256 (bias) + 512 (LayerNorm) = 131,840
Layer 3: Linear(256 β 128) + LayerNorm + ELU
- Input: 256D features
- Output: 128D features
- Parameters: 32,768 (weights) + 128 (bias) + 256 (LayerNorm) = 33,152
Layer 4: Linear(128 β 29) (Action Head)
- Input: 128D features
- Output: 29D raw actions
- Parameters: 3,712 (weights) + 29 (bias) = 3,741
Weight Summary
| Weight Name | Shape | Parameters |
|---|---|---|
obs_normalizer._mean |
[1, 100] | 100 |
obs_normalizer._std |
[1, 100] | 100 |
actor.net.0.weight |
[512, 100] | 51,200 |
actor.net.0.bias |
[512] | 512 |
actor.net.1.weight (LN) |
[512] | 512 |
actor.net.1.bias (LN) |
[512] | 512 |
actor.net.3.weight |
[256, 512] | 131,072 |
actor.net.3.bias |
[256] | 256 |
actor.net.4.weight (LN) |
[256] | 256 |
actor.net.4.bias (LN) |
[256] | 256 |
actor.net.6.weight |
[128, 256] | 32,768 |
actor.net.6.bias |
[128] | 128 |
actor.net.7.weight (LN) |
[128] | 128 |
actor.net.7.bias (LN) |
[128] | 128 |
actor.fc_mu.0.weight |
[29, 128] | 3,712 |
actor.fc_mu.0.bias |
[29] | 29 |
actor.action_scale |
[29] | 29 |
actor.action_bias |
[29] | 29 |
Total: 221,727 parameters
Forward Pass Data Flow
Complete Forward Pass
def forward(obs_100d):
"""
Args:
obs_100d: (batch, 100) observation vector
Returns:
actions: (batch, 29) joint position deltas
"""
# 1. Normalize observation (baked into ONNX)
x = (obs_100d - obs_mean) / obs_std
# 2. Hidden layer 1
x = linear_0(x) # (batch, 512)
x = layer_norm_1(x)
x = elu(x)
# 3. Hidden layer 2
x = linear_3(x) # (batch, 256)
x = layer_norm_4(x)
x = elu(x)
# 4. Hidden layer 3
x = linear_6(x) # (batch, 128)
x = layer_norm_7(x)
x = elu(x)
# 5. Action head
x = fc_mu(x) # (batch, 29)
x = tanh(x) # Bound to [-1, 1]
# 6. Scale actions (baked into ONNX)
actions = x * action_scale + action_bias
return actions
Computational Cost
Forward pass operations:
1. Normalization: 100D subtract + divide (~200 ops)
2. Layer 1: 100β512 matmul + LN + ELU (~52K MACs)
3. Layer 2: 512β256 matmul + LN + ELU (~131K MACs)
4. Layer 3: 256β128 matmul + LN + ELU (~33K MACs)
5. Action head: 128β29 matmul + tanh + scale (~4K MACs)
Total: ~220K multiply-accumulate operations per inference
PD Gains
PD gains are stored in ONNX metadata and loaded at runtime.
PD Gain Selection Methodology
Holosoma uses the Raibert-style rule (BeyondMimic method) to compute PD gains from joint armature values:
Kp = I Γ (2Ο Γ fn)Β²
Kd = 2 Γ I Γ ΞΆ Γ (2Ο Γ fn)
Where:
- I = Joint armature (rotational inertia, from URDF/MuJoCo)
- fn = Natural frequency = 10 Hz
- ΞΆ = Damping ratio = 2.0 (over-damped for stability)
This physics-based approach ensures each joint's stiffness/damping is proportional to its inertia.
Verification
| Joint | Armature (I) | Formula Kp | Actual Kp | Match |
|---|---|---|---|---|
| hip_yaw | 0.01018 | 40.179 | 40.179 | β |
| hip_roll | 0.02510 | 99.098 | 99.098 | β |
| ankle | 0.00722 | 28.501 | 28.501 | β |
| shoulder | 0.00361 | 14.251 | 14.251 | β |
| wrist | 0.00425 | 16.778 | 16.778 | β |
Alternative Rule of Thumb (Kyle's Method)
Another common approach in RL locomotion:
Kp = Ο_max / RoM (or Ο_max / (RoM/2) for stiffer response)
Kd = Kp / 20
Where Ο_max is max torque and RoM is range of motion. This gives similar ballpark values but doesn't account for joint inertia.
Kp (Position Gains)
| Joint Group | Kp Values | Armature |
|---|---|---|
| Hip pitch/yaw | 40.18 | 0.0102 |
| Hip roll, Knee | 99.10 | 0.0251 |
| Ankle | 28.50 | 0.0072 |
| Waist yaw | 40.18 | 0.0102 |
| Waist roll/pitch | 28.50 | 0.0072 |
| Shoulder/Elbow | 14.25 | 0.0036 |
| Wrist | 16.78 | 0.0043 |
Kd (Velocity Gains)
| Joint Group | Kd Values | Armature |
|---|---|---|
| Hip pitch/yaw | 2.56 | 0.0102 |
| Hip roll, Knee | 6.31 | 0.0251 |
| Ankle | 1.81 | 0.0072 |
| Waist yaw | 2.56 | 0.0102 |
| Waist roll/pitch | 1.81 | 0.0072 |
| Shoulder/Elbow | 0.91 | 0.0036 |
| Wrist | 1.07 | 0.0043 |
References
- Kyle's Rule of Thumb -
Kp = Ο_max / RoM - Whole-Body Humanoid Robot Locomotion with Human Reference (Section III.C) - BeyondMimic critical damping method
Comparison with GR00T
| Feature | Holosoma FastSAC | GR00T |
|---|---|---|
| Architecture | Single unified actor | Dual policy (Balance + Walk) |
| Observation | 100D (single frame) | 516D (6 frames Γ 86D) |
| Actions | 29D (full body) | 15D (legs + waist only) |
| Parameters | 221,727 | 470,578 |
| Policy Switching | None | Based on command magnitude |
| Temporal Context | Gait phase encoding | 6-frame history stack |
| Normalization | LayerNorm + obs normalizer | L2 norm on latent |
| Control Frequency | 50 Hz | 50 Hz |
Key Differences
Single vs Dual Policy: Holosoma handles balance and walking with one network; GR00T switches between dedicated balance/walk policies.
Temporal Handling: GR00T stacks 6 frames (516D) for temporal context; Holosoma uses gait phase signals (sin/cos) to encode periodicity.
Body Coverage: Holosoma controls all 29 DOF including arms; GR00T only controls 15 DOF (legs + waist), leaving arms for other tasks.
Architecture: GR00T uses an encoder-decoder (estimator β actor) design with latent normalization; Holosoma is a straightforward MLP with LayerNorm.
Usage
from huggingface_hub import hf_hub_download
import onnxruntime as ort
import numpy as np
# Load model
path = hf_hub_download(repo_id="nepyope/holosoma_locomotion",
filename="fastsac_g1_29dof.onnx")
policy = ort.InferenceSession(path)
# Build 100D observation
obs = np.zeros((1, 100), dtype=np.float32)
obs[0, 0:29] = last_action
obs[0, 29:32] = ang_vel * 0.25
obs[0, 32] = cmd_yaw
obs[0, 33:35] = cmd_lin
obs[0, 35:37] = np.cos(phase)
obs[0, 37:66] = (joint_pos - default_pos) * 1.0
obs[0, 66:95] = joint_vel * 0.05
obs[0, 95:98] = gravity
obs[0, 98:100] = np.sin(phase)
# Run inference
action = policy.run(None, {"actor_obs": obs})[0].squeeze()
# Apply to robot
target = default_pos + action * 0.25
References
- Holosoma Project: Whole-body locomotion for humanoid robots
- SAC Algorithm: "Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning"
- Unitree G1: https://www.unitree.com/g1
Last Updated: 2025-12-17