Robot Flow Labs // ANIMA Stack

FORGE

Fast Optimized Robot Generation Engine
7B teacher → <1B student → 14.1 fps on edge GPU
↓ scroll ↓
9x
Model Compression
14.1
FPS on NVIDIA L4 (FP16)
28x
Faster Than Teacher
<600MB
INT4 Deploy Size
The Problem

VLAs Are Too Big for Robots

Vision-Language-Action models (OpenVLA, RT-2, Pi0) achieve state-of-the-art robot manipulation — but at 7B+ parameters, they run at 0.5 fps. Robots need 10+ fps for real-time control. FORGE solves this with automated knowledge distillation.

Before // OpenVLA-7B
Parameters7,000M
Throughput~0.5 fps
Latency~2,000 ms
Model Size~13 GB
Edge DeployNo
vs
After // FORGE-Nano
Parameters774M
Throughput14.1 fps
Latency71 ms
Model Size<600 MB
Edge DeployJetson + Apple Silicon
Architecture

4-Stage Distillation Pipeline

👁
SigLIP-SO400M
Vision Encoder (frozen)
472M
🌐
Bridge Attention
64 queries, 4 layers
40M
🧠
Qwen2.5-0.5B
LoRA rank=64
494M
🎯
Flow Head
1-step inference
1.7M
Total: 967.9M params → Pruned: 774.1M params → INT4: <600 MB
GPU Benchmarks // 4x NVIDIA L4

Measured Performance

Best Speed
14.1 fps
Flow + LoRA-64 + 60% prune, FP16
Best Training
92.3%
Loss reduction in 30 steps
Most Compressed
739M
50% pruned, functional model
Multi-GPU
33.6 fps
Single GPU, batch=32, FP16
Multi-Teacher
5
Teachers with learned routing
Test Suite
524
Tests passing, 0 failures
Recommended Production Config
variantnano
action_headflow
lora_rank64
prune_ratio0.60
params774.1M
fps (FP16)14.1
deploy_size<600 MB
targetJetson + Apple Silicon
Compatibility

Any Teacher, Any Robot

OpenVLA-7B
7.6B
Token-AR // H=1
RDT2-FM
1.2B
Diffusion // H=8
SmolVLA
0.5B
Parallel // H=1
BitVLA
5.9B
Quantized // H=1
Pi0
3.0B
Flow // H=4
Robots: Franka // ALOHA // xArm // UR5e // + any 6-14 DoF arm

FORGE is Open Source

Part of the ANIMA agentic robotics stack. Apache 2.0 licensed. Built for production deployment on edge hardware.