Neon Flow Standalone v1

First trained Neon VLA action decoder — Flow Matching Head for humanoid robot control.

Model Description

This is a Flow Matching action decoder trained on 6 synthetic Neon datasets (110K episodes total). It predicts 14-DoF arm joint actions from natural language instructions.

Architecture

Language Instruction → CharTokenizer → TransformerEncoder(2L) → MeanPool → Linear(512)
                                                                              ↓
                                                               FlowMatchingHead(6 layers)
                                                                   - Sinusoidal time embed
                                                                   - RMSNorm + residual blocks
                                                                   - 10-step Euler ODE sampling
                                                                              ↓
                                                                    14-DoF Joint Actions

Key Numbers

Metric Value
Total Parameters ~5.2M
Action Dimensions 14 (arms_only)
Training Epochs 20
Training GPU NVIDIA L40S (48GB)
Training Data 110K episodes from 6 datasets
Flow Steps (inference) 10 (Euler ODE)

Training Datasets

Dataset Episodes Type
neon-spatial-language-20k 20,000 Spatial reasoning
neon-g1-kitchen-10k 10,000 Kitchen manipulation
neon-g1-diverse-50k 50,000 Multi-scene diversity
neon-long-horizon-15k 15,000 Multi-step chains
neon-failure-recovery-5k 5,000 Failure + retry
neon-bimanual-10k 10,000 Two-arm coordination

Usage

import torch

# Load model
checkpoint = torch.load("neon_standalone_final.pt", map_location="cpu")
config = checkpoint["config"]
print(f"Action dim: {config['action_dim']}, Best loss: {config['best_loss']:.6f}")

# The model weights can be loaded into the NeonStandaloneModel class
# from neon/scripts/hf_train_standalone.py

Part of Neon VLA

This model is part of the Neon project — an open-source Vision-Language-Action model for humanoid whole-body control.

Neon connects video foundation models to robot bodies through tiny, elegant action decoders.

Citation

@software{neon2026,
  title={Neon: Teaching Robots to See Time},
  author={Cali, Cagatay},
  year={2026},
  url={https://github.com/cagataycali/neon}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading

Datasets used to train cagataydev/neon-flow-standalone-v1