datasets:
- behavior-1k/2025-challenge-demos
- IliaLarchenko/behavior_224_rgb
license: apache-2.0
tags:
- robotics
pipeline_tag: robotics
This is an intermediate checkpoint that we used in our 1st place solution of the 2025 BEHAVIOR Challenge.
This checkpoint is obtained by training the policy on 50 tasks simultaneously for ~2 weeks.
It is not part of our final submission. Also, we didn't run the whole evaluation of this checkpoint, but we would expect it to achieve a 15-20% q-score.
Paper: Task adaptation of Vision-Language-Action model: 1st Place Solution for the 2025 BEHAVIOR Challenge Project page: https://behavior.stanford.edu/challenge/ Code/GitHub Repository: IliaLarchenko/behavior-1k-solution arXiv: 2512.06951
The final submission checkpoints
Sample Usage
This section provides a quick overview of how to get started with the model, adapted from the GitHub repository.
Installation
# Clone with submodules (includes openpi and BEHAVIOR-1K)
git clone --recurse-submodules https://github.com/ilialarchenko/behavior-1k-solution.git
cd behavior-1k-solution
# Run setup script (installs uv, dependencies, and sets up environment)
bash setup_remote.sh
Dataset Preparation
Download the official BEHAVIOR-1K dataset from HuggingFace:
# Login to HuggingFace (need to avoid request rate limit errors)
uv run huggingface-cli login
# Download the full dataset (~2TB)
uv run python - <<'PY'
from huggingface_hub import snapshot_download
snapshot_download(
repo_id="behavior-1k/2025-challenge-demos",
repo_type="dataset",
local_dir="./data/behavior_dataset",
local_dir_use_symlinks=False
)
PY
Alternative: Use the resized RGB-only dataset (224×224, ~260GB) for faster training:
uv run python - <<'PY'
from huggingface_hub import snapshot_download
snapshot_download(
repo_id="IliaLarchenko/behavior_224_rgb",
repo_type="dataset",
local_dir="./data/behavior_224_rgb",
local_dir_use_symlinks=False
)
PY
Pre-training Setup
Compute dataset statistics and train FAST tokenizer:
# Compute normalization statistics with correlation matrix
uv run scripts/compute_norm_stats.py --config-name pi_behavior_b1k_fast --correlation
# Train FAST tokenizer for action discretization
uv run scripts/train_fast_tokenizer.py \
--config-name pi_behavior_b1k_fast \
--encoded-dims="0:6,7:23" \
--vocab-size=1024
Training
Single GPU Training:
uv run scripts/train.py pi_behavior_b1k_fast \
--batch_size=16 \
--num_train_steps=200000 \
--save_interval=2000 \
--keep_period=10000 \
--log_interval=100
Multi-GPU Training:
uv run scripts/train.py pi_behavior_b1k_fast \
--batch_size=2048 \
--num_train_steps=200000 \
--fsdp_devices=8 \
--save_interval=250 \
--keep_period=4000 \
--log_interval=25
Evaluation
Start the policy server:
uv run scripts/serve_b1k.py policy:checkpoint \
--policy.config pi_behavior_b1k_fast \
--policy.dir /path/to/checkpoint
In a separate terminal, run evaluation (requires BEHAVIOR-1K environment):
python BEHAVIOR-1K/omnigibson/learning/eval.py \
log_path=./eval_logs \
policy=websocket \
task.name=make_microwave_popcorn \
model.host=localhost \
eval_instance_ids="[0,1,2,3]"
Citation
If you find this work useful, please cite:
@misc{larchenko2025behavior,
title={Task adaptation of Vision-Language-Action model: 1st Place Solution for the 2025 BEHAVIOR Challenge},
author={Ilia Larchenko and Gleb Zarin and Akash Karnatak},
year={2025},
eprint={2512.06951},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2512.06951},
}