Neon Flow Standalone v1

First trained Neon VLA action decoder — Flow Matching Head for humanoid robot control.

Model Description

This is a Flow Matching action decoder trained on 6 synthetic Neon datasets (110K episodes total). It predicts 14-DoF arm joint actions from natural language instructions.

Architecture

Language Instruction → CharTokenizer → TransformerEncoder(2L) → MeanPool → Linear(512)
                                                                              ↓
                                                               FlowMatchingHead(6 layers)
                                                                   - Sinusoidal time embed
                                                                   - RMSNorm + residual blocks
                                                                   - 10-step Euler ODE sampling
                                                                              ↓
                                                                    14-DoF Joint Actions

Key Numbers

Metric	Value
Total Parameters	~5.2M
Action Dimensions	14 (arms_only)
Training Epochs	20
Training GPU	NVIDIA L40S (48GB)
Training Data	110K episodes from 6 datasets
Flow Steps (inference)	10 (Euler ODE)

Training Datasets

Dataset	Episodes	Type
neon-spatial-language-20k	20,000	Spatial reasoning
neon-g1-kitchen-10k	10,000	Kitchen manipulation
neon-g1-diverse-50k	50,000	Multi-scene diversity
neon-long-horizon-15k	15,000	Multi-step chains
neon-failure-recovery-5k	5,000	Failure + retry
neon-bimanual-10k	10,000	Two-arm coordination

Usage

import torch

# Load model
checkpoint = torch.load("neon_standalone_final.pt", map_location="cpu")
config = checkpoint["config"]
print(f"Action dim: {config['action_dim']}, Best loss: {config['best_loss']:.6f}")

# The model weights can be loaded into the NeonStandaloneModel class
# from neon/scripts/hf_train_standalone.py

Part of Neon VLA

This model is part of the Neon project — an open-source Vision-Language-Action model for humanoid whole-body control.

Neon connects video foundation models to robot bodies through tiny, elegant action decoders.

Citation

@software{neon2026,
  title={Neon: Teaching Robots to See Time},
  author={Cali, Cagatay},
  year={2026},
  url={https://github.com/cagataycali/neon}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Robotics

cagataydev
/

neon-flow-standalone-v1