Robot Flow Labs // ANIMA Stack

FORGE

Fast Optimized Robot Generation Engine

7B teacher → <1B student → 14.1 fps on edge GPU

↓ scroll ↓

The Problem

VLAs Are Too Big for Robots

Vision-Language-Action models (OpenVLA, RT-2, Pi0) achieve state-of-the-art robot manipulation — but at 7B+ parameters, they run at 0.5 fps. Robots need 10+ fps for real-time control. FORGE solves this with automated knowledge distillation.

Before // OpenVLA-7B

Parameters7,000M

Throughput~0.5 fps

Latency~2,000 ms

Model Size~13 GB

Edge DeployNo

After // FORGE-Nano

Parameters774M

Throughput14.1 fps

Latency71 ms

Model Size<600 MB

Edge DeployJetson + Apple Silicon

Architecture

4-Stage Distillation Pipeline

👁

SigLIP-SO400M

Vision Encoder (frozen)

472M

→

🌐

Bridge Attention

64 queries, 4 layers

40M

→

🧠

Qwen2.5-0.5B

LoRA rank=64

494M

→

🎯

Flow Head

1-step inference

1.7M

Total: 967.9M params → Pruned: 774.1M params → INT4: <600 MB

GPU Benchmarks // 4x NVIDIA L4

Measured Performance

Best Speed

14.1 fps

Flow + LoRA-64 + 60% prune, FP16

Best Training

92.3%

Loss reduction in 30 steps

Most Compressed

739M

50% pruned, functional model

Multi-GPU

33.6 fps

Single GPU, batch=32, FP16

Multi-Teacher

Teachers with learned routing

Test Suite

524

Tests passing, 0 failures

Recommended Production Config

variantnano

action_headflow

lora_rank64

prune_ratio0.60

params774.1M

fps (FP16)14.1

deploy_size<600 MB

targetJetson + Apple Silicon

Compatibility

Any Teacher, Any Robot

OpenVLA-7B

7.6B

Token-AR // H=1

RDT2-FM

1.2B

Diffusion // H=8

SmolVLA

0.5B

Parallel // H=1

BitVLA

5.9B

Quantized // H=1

Pi0

3.0B

Flow // H=4

Robots: Franka // ALOHA // xArm // UR5e // + any 6-14 DoF arm

FORGE is Open Source

Part of the ANIMA agentic robotics stack. Apache 2.0 licensed. Built for production deployment on edge hardware.

GitHub HuggingFace Robot Flow Labs