LingBot-VA G1 Diffusion World Model

Fine-tuned Wan2.2-5B diffusion transformer for the Biomech G1 dual-arm robot.

Model Details

  • Base: Wan2.2-5B Diffusion Transformer
  • Parameters: ~5.3B (5.0B video + 0.3B action expert)
  • Architecture: Mixture-of-Transformers (MoT) with interleaved visual/action tokens
  • Action Space: 30D (G1 robot joints)
  • Visual Input: 3 cameras (head + 2 wrists), 256x320
  • Training: G1-BrainCo dataset, 1,598 episodes, 5,000 steps
  • Hardware: 1x A100 80GB
  • Attention: flex_attention with causal masking

Usage

Load with diffusers:

from diffusers import DiffusionPipeline
import torch

pipe = DiffusionPipeline.from_pretrained(
    "jfgpt/lingbot-va-g1/checkpoint_step_5000/transformer",
    torch_dtype=torch.bfloat16,
)

Requirements

  • PyTorch 2.x + CUDA
  • diffusers >= 0.35.0
  • transformers

License

All rights reserved — The Robbyant Team Authors.

Downloads last month
-
Video Preview
loading