Improve model card: Add pipeline tag, paper, project page, code links, and sample usage

af75cfe verified 3 months ago

4.17 kB

datasets:
  - behavior-1k/2025-challenge-demos
  - IliaLarchenko/behavior_224_rgb
license: apache-2.0
tags:
  - robotics
pipeline_tag: robotics

This is an intermediate checkpoint that we used in our 1st place solution of the 2025 BEHAVIOR Challenge.

This checkpoint is obtained by training the policy on 50 tasks simultaneously for ~2 weeks.

It is not part of our final submission. Also, we didn't run the whole evaluation of this checkpoint, but we would expect it to achieve a 15-20% q-score.

Paper: Task adaptation of Vision-Language-Action model: 1st Place Solution for the 2025 BEHAVIOR Challenge Project page: https://behavior.stanford.edu/challenge/ Code/GitHub Repository: IliaLarchenko/behavior-1k-solution arXiv: 2512.06951

The final submission checkpoints

Sample Usage

This section provides a quick overview of how to get started with the model, adapted from the GitHub repository.

Installation

# Clone with submodules (includes openpi and BEHAVIOR-1K)
git clone --recurse-submodules https://github.com/ilialarchenko/behavior-1k-solution.git
cd behavior-1k-solution

# Run setup script (installs uv, dependencies, and sets up environment)
bash setup_remote.sh

Dataset Preparation

Download the official BEHAVIOR-1K dataset from HuggingFace:

# Login to HuggingFace (need to avoid request rate limit errors)
uv run huggingface-cli login

# Download the full dataset (~2TB)
uv run python - <<'PY'
from huggingface_hub import snapshot_download
snapshot_download(
  repo_id="behavior-1k/2025-challenge-demos",
  repo_type="dataset",
  local_dir="./data/behavior_dataset",
  local_dir_use_symlinks=False
)
PY

Alternative: Use the resized RGB-only dataset (224×224, ~260GB) for faster training:

uv run python - <<'PY'
from huggingface_hub import snapshot_download
snapshot_download(
  repo_id="IliaLarchenko/behavior_224_rgb",
  repo_type="dataset",
  local_dir="./data/behavior_224_rgb",
  local_dir_use_symlinks=False
)
PY

Pre-training Setup

Compute dataset statistics and train FAST tokenizer:

# Compute normalization statistics with correlation matrix
uv run scripts/compute_norm_stats.py --config-name pi_behavior_b1k_fast --correlation

# Train FAST tokenizer for action discretization
uv run scripts/train_fast_tokenizer.py \
  --config-name pi_behavior_b1k_fast \
  --encoded-dims="0:6,7:23" \
  --vocab-size=1024

Training

Single GPU Training:

uv run scripts/train.py pi_behavior_b1k_fast \
  --batch_size=16 \
  --num_train_steps=200000 \
  --save_interval=2000 \
  --keep_period=10000 \
  --log_interval=100

Multi-GPU Training:

uv run scripts/train.py pi_behavior_b1k_fast \
  --batch_size=2048 \
  --num_train_steps=200000 \
  --fsdp_devices=8 \
  --save_interval=250 \
  --keep_period=4000 \
  --log_interval=25

Evaluation

Start the policy server:

uv run scripts/serve_b1k.py policy:checkpoint \
  --policy.config pi_behavior_b1k_fast \
  --policy.dir /path/to/checkpoint

In a separate terminal, run evaluation (requires BEHAVIOR-1K environment):

python BEHAVIOR-1K/omnigibson/learning/eval.py \
  log_path=./eval_logs \
  policy=websocket \
  task.name=make_microwave_popcorn \
  model.host=localhost \
  eval_instance_ids="[0,1,2,3]"

Citation

If you find this work useful, please cite:

@misc{larchenko2025behavior,
      title={Task adaptation of Vision-Language-Action model: 1st Place Solution for the 2025 BEHAVIOR Challenge}, 
      author={Ilia Larchenko and Gleb Zarin and Akash Karnatak},
      year={2025},
      eprint={2512.06951},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2512.06951}, 
}