Neon Flow Standalone v1
First trained Neon VLA action decoder — Flow Matching Head for humanoid robot control.
Model Description
This is a Flow Matching action decoder trained on 6 synthetic Neon datasets (110K episodes total). It predicts 14-DoF arm joint actions from natural language instructions.
Architecture
Language Instruction → CharTokenizer → TransformerEncoder(2L) → MeanPool → Linear(512)
↓
FlowMatchingHead(6 layers)
- Sinusoidal time embed
- RMSNorm + residual blocks
- 10-step Euler ODE sampling
↓
14-DoF Joint Actions
Key Numbers
| Metric | Value |
|---|---|
| Total Parameters | ~5.2M |
| Action Dimensions | 14 (arms_only) |
| Training Epochs | 20 |
| Training GPU | NVIDIA L40S (48GB) |
| Training Data | 110K episodes from 6 datasets |
| Flow Steps (inference) | 10 (Euler ODE) |
Training Datasets
| Dataset | Episodes | Type |
|---|---|---|
| neon-spatial-language-20k | 20,000 | Spatial reasoning |
| neon-g1-kitchen-10k | 10,000 | Kitchen manipulation |
| neon-g1-diverse-50k | 50,000 | Multi-scene diversity |
| neon-long-horizon-15k | 15,000 | Multi-step chains |
| neon-failure-recovery-5k | 5,000 | Failure + retry |
| neon-bimanual-10k | 10,000 | Two-arm coordination |
Usage
import torch
# Load model
checkpoint = torch.load("neon_standalone_final.pt", map_location="cpu")
config = checkpoint["config"]
print(f"Action dim: {config['action_dim']}, Best loss: {config['best_loss']:.6f}")
# The model weights can be loaded into the NeonStandaloneModel class
# from neon/scripts/hf_train_standalone.py
Part of Neon VLA
This model is part of the Neon project — an open-source Vision-Language-Action model for humanoid whole-body control.
Neon connects video foundation models to robot bodies through tiny, elegant action decoders.
Citation
@software{neon2026,
title={Neon: Teaching Robots to See Time},
author={Cali, Cagatay},
year={2026},
url={https://github.com/cagataycali/neon}
}