| | --- |
| | datasets: |
| | - behavior-1k/2025-challenge-demos |
| | - IliaLarchenko/behavior_224_rgb |
| | license: apache-2.0 |
| | tags: |
| | - robotics |
| | pipeline_tag: robotics |
| | --- |
| | |
| | This is an intermediate checkpoint that we used in our [1st place solution of the 2025 BEHAVIOR Challenge](https://github.com/IliaLarchenko/behavior-1k-solution). |
| |
|
| | This checkpoint is obtained by training the policy on 50 tasks simultaneously for ~2 weeks. |
| |
|
| | It is not part of our [final submission](https://huggingface.co/IliaLarchenko/behavior_submission). Also, we didn't run the whole evaluation of this checkpoint, but we would expect it to achieve a 15-20% q-score. |
| |
|
| | Paper: [Task adaptation of Vision-Language-Action model: 1st Place Solution for the 2025 BEHAVIOR Challenge](https://huggingface.co/papers/2512.06951) |
| | Project page: https://behavior.stanford.edu/challenge/ |
| | Code/GitHub Repository: [IliaLarchenko/behavior-1k-solution](https://github.com/IliaLarchenko/behavior-1k-solution) |
| | arXiv: [2512.06951](https://arxiv.org/abs/2512.06951) |
| |
|
| | The [final submission checkpoints](https://huggingface.co/IliaLarchenko/behavior_submission) |
| |
|
| | ## Sample Usage |
| |
|
| | This section provides a quick overview of how to get started with the model, adapted from the [GitHub repository](https://github.com/IliaLarchenko/behavior-1k-solution). |
| |
|
| | ### Installation |
| |
|
| | ```bash |
| | # Clone with submodules (includes openpi and BEHAVIOR-1K) |
| | git clone --recurse-submodules https://github.com/ilialarchenko/behavior-1k-solution.git |
| | cd behavior-1k-solution |
| | |
| | # Run setup script (installs uv, dependencies, and sets up environment) |
| | bash setup_remote.sh |
| | ``` |
| |
|
| | ### Dataset Preparation |
| |
|
| | Download the official BEHAVIOR-1K dataset from HuggingFace: |
| |
|
| | ```bash |
| | # Login to HuggingFace (need to avoid request rate limit errors) |
| | uv run huggingface-cli login |
| | |
| | # Download the full dataset (~2TB) |
| | uv run python - <<'PY' |
| | from huggingface_hub import snapshot_download |
| | snapshot_download( |
| | repo_id="behavior-1k/2025-challenge-demos", |
| | repo_type="dataset", |
| | local_dir="./data/behavior_dataset", |
| | local_dir_use_symlinks=False |
| | ) |
| | PY |
| | ``` |
| |
|
| | **Alternative**: Use the resized RGB-only dataset (224×224, ~260GB) for faster training: |
| | ```bash |
| | uv run python - <<'PY' |
| | from huggingface_hub import snapshot_download |
| | snapshot_download( |
| | repo_id="IliaLarchenko/behavior_224_rgb", |
| | repo_type="dataset", |
| | local_dir="./data/behavior_224_rgb", |
| | local_dir_use_symlinks=False |
| | ) |
| | PY |
| | ``` |
| |
|
| | ### Pre-training Setup |
| |
|
| | Compute dataset statistics and train FAST tokenizer: |
| |
|
| | ```bash |
| | # Compute normalization statistics with correlation matrix |
| | uv run scripts/compute_norm_stats.py --config-name pi_behavior_b1k_fast --correlation |
| | |
| | # Train FAST tokenizer for action discretization |
| | uv run scripts/train_fast_tokenizer.py \ |
| | --config-name pi_behavior_b1k_fast \ |
| | --encoded-dims="0:6,7:23" \ |
| | --vocab-size=1024 |
| | ``` |
| |
|
| | ### Training |
| |
|
| | **Single GPU Training**: |
| | ```bash |
| | uv run scripts/train.py pi_behavior_b1k_fast \ |
| | --batch_size=16 \ |
| | --num_train_steps=200000 \ |
| | --save_interval=2000 \ |
| | --keep_period=10000 \ |
| | --log_interval=100 |
| | ``` |
| |
|
| | **Multi-GPU Training**: |
| | ```bash |
| | uv run scripts/train.py pi_behavior_b1k_fast \ |
| | --batch_size=2048 \ |
| | --num_train_steps=200000 \ |
| | --fsdp_devices=8 \ |
| | --save_interval=250 \ |
| | --keep_period=4000 \ |
| | --log_interval=25 |
| | ``` |
| |
|
| | ### Evaluation |
| |
|
| | Start the policy server: |
| |
|
| | ```bash |
| | uv run scripts/serve_b1k.py policy:checkpoint \ |
| | --policy.config pi_behavior_b1k_fast \ |
| | --policy.dir /path/to/checkpoint |
| | ``` |
| |
|
| | In a separate terminal, [run evaluation](https://behavior.stanford.edu/challenge/baselines.html) (requires BEHAVIOR-1K environment): |
| |
|
| | ```bash |
| | python BEHAVIOR-1K/omnigibson/learning/eval.py \ |
| | log_path=./eval_logs \ |
| | policy=websocket \ |
| | task.name=make_microwave_popcorn \ |
| | model.host=localhost \ |
| | eval_instance_ids="[0,1,2,3]" |
| | ``` |
| |
|
| | ## Citation |
| |
|
| | If you find this work useful, please cite: |
| |
|
| | ```bibtex |
| | @misc{larchenko2025behavior, |
| | title={Task adaptation of Vision-Language-Action model: 1st Place Solution for the 2025 BEHAVIOR Challenge}, |
| | author={Ilia Larchenko and Gleb Zarin and Akash Karnatak}, |
| | year={2025}, |
| | eprint={2512.06951}, |
| | archivePrefix={arXiv}, |
| | primaryClass={cs.RO}, |
| | url={https://arxiv.org/abs/2512.06951}, |
| | } |
| | ``` |