FORGE-Nano: Compressed VLA for Real-Time Robot Control

7B VLA teacher → <1B student → 14.1 fps on edge GPU

What is FORGE?

FORGE (Fast Optimized Robot Generation Engine) is a model distillation pipeline that takes any 7B+ Vision-Language-Action (VLA) model and compresses it to <2GB for real-time edge deployment on NVIDIA Jetson and Apple Silicon.

Part of the ANIMA agentic robotics AI stack by Robot Flow Labs.

Architecture

Teacher (7B VLA)
    |
    v
[SigLIP-SO400M] ---> [Bridge Attention] ---> [Qwen2.5-0.5B + LoRA] ---> [Action Head]
   (frozen)          (64 queries, 4L)       (rank=32/64)             (diffusion/flow)
   472.3M params     39.7M params           ~494M params             ~1.7M params

Total: 967.9M params (495.6M trainable, 472.3M frozen)

Benchmark Results (4x NVIDIA L4 24GB)

Student Variants

Variant	Params	FP32 fps	FP16 fps	FP16 Speedup	Training Loss Reduction
Nano (diffusion, LoRA=32)	967.9M	7.9	11.0	1.39x	67.0%
Nano (diffusion, LoRA=64)	972.3M	7.9	10.8	1.37x	76.9%
Nano (flow, LoRA=32)	967.9M	8.2	12.6	1.54x	85.8%
Small (diffusion)	2097.7M	6.2	9.9	--	--
Small (flow)	2097.7M	6.1	11.3	--	--

Full Pipeline: Build -> Train -> Prune -> Deploy

Configuration	Post-Prune Params	FP32 fps	FP16 fps	Loss Reduction
Diffusion + p75 + INT4	830.8M	10.0	12.0	41.4%
Flow + p50 + INT4	739.3M	14.1	7.8	76.3%
LoRA-64 + p90 + INT4	880.8M	9.1	11.2	86.3%
Flow + LoRA-64 + p60	774.1M	12.7	14.1	75.7%
No prune + INT8	922.2M	8.1	11.0	59.4%

Multi-GPU Scaling

GPUs	FP32 b=16	FP16 b=32
1 GPU	9.3 fps	33.6 fps
2 GPU	13.5 fps	--
4 GPU	13.6 fps	31.6 fps

Multi-Teacher Distillation

5 teachers fit across 2 GPUs (22.7 GB total)
Router learns optimal teacher weighting in <50 steps
Best config: balanced (alpha_task=0.3) achieves 76.1% loss reduction
Supports: OpenVLA-7B, RDT2-FM, SmolVLA, BitVLA, Pi0

Pruning Results

Pruning Ratio	Layers	Params	FP32 fps	Speedup
No prune	24	967.9M	7.9	1.0x
90% keep	18	880.8M	9.1	1.15x
75% keep	15	830.8M	10.0	1.27x
60% keep	11	774.1M	12.7	1.61x
50% keep	9	739.3M	14.1	1.78x

Recommended Configurations

Production (Edge Deployment)

variant: nano
action_head: flow
lora_rank: 64
prune_ratio: 0.60
target_bits: 4
# Result: 774M params, FP16 14.1 fps, <600MB INT4

Quality-First

variant: nano
action_head: diffusion
lora_rank: 32
prune_ratio: 0.75
target_bits: 8
# Result: 830M params, 92.3% loss reduction

Key Findings

Flow matching head is 15% faster than diffusion at FP16 inference (12.6 vs 11.0 fps)
LoRA rank=64 trains 10% better than rank=32 (76.9% vs 67.0% loss reduction) with negligible speed cost
Aggressive pruning works: 50% layer removal still produces a functional model at 14.1 fps
FP16 autocast gives 1.4-1.5x speedup for free — always use it in production
Multi-teacher routing converges fast: Router learns to weight teachers optimally in <50 steps

Supported Teachers

Teacher	Type	Params	Chunk Size
OpenVLA-7B	Token-AR	7.6B	H=1
RDT2-FM	Diffusion	1.2B	H=8
SmolVLA	Parallel	0.5B	H=1
BitVLA	Quantized	5.9B	H=1
Pi0	Flow	3.0B	H=4

Supported Robots

Robot	DoF	Action Head	Horizon	Control Rate
Franka Panda	7	Flow	H=8	20 Hz
ALOHA (bimanual)	14	Chunk	H=16	50 Hz
xArm	6	Flow	H=4	100 Hz
UR5e	6	Flow	H=4	125 Hz

Pipeline

Teacher Labels -> Knowledge Distillation -> Layer Pruning -> Quantization -> Edge Export
  (HDF5)           (LoRA + Bridge)        (Chunk-aware)    (INT4/INT8)     (TRT/ONNX/MLX)

Usage

pip install anima-forge

# Full pipeline
forge pipeline --device cuda --variant nano --steps 5000

# Auto-detect model dimensions
forge autosense --model-dir /path/to/models

# Benchmark
forge benchmark run --device cuda

# Deploy
forge serve --port 8000

Citation

@software{forge2026,
  title={FORGE: Fast Optimized Robot Generation Engine},
  author={Robot Flow Labs},
  year={2026},
  url={https://github.com/RobotFlow-Labs/anima-forge-distillation-pipeline}
}

License

Apache 2.0

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Robotics

Datasets used to train robotflowlabs/FORGE-Nano-Benchmark

Collection including robotflowlabs/FORGE-Nano-Benchmark

ANIMA VLA

Collection

Vision-Language-Action models for end-to-end robotic control. SmolVLA, RDT2-FM action generation. • 1 item • Updated Mar 19