Instructions to use arrow-hf/smolvla-robotwin-stack-bowls-two-50ep-multi with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- LeRobot
How to use arrow-hf/smolvla-robotwin-stack-bowls-two-50ep-multi with LeRobot:
# See https://github.com/huggingface/lerobot?tab=readme-ov-file#installation for more details git clone https://github.com/huggingface/lerobot.git cd lerobot pip install -e .[smolvla]
# Launch finetuning on your dataset python lerobot/scripts/train.py \ --policy.path=arrow-hf/smolvla-robotwin-stack-bowls-two-50ep-multi \ --dataset.repo_id=lerobot/svla_so101_pickplace \ --batch_size=64 \ --steps=20000 \ --output_dir=outputs/train/my_smolvla \ --job_name=my_smolvla_training \ --policy.device=cuda \ --wandb.enable=true
# Run the policy using the record function python -m lerobot.record \ --robot.type=so101_follower \ --robot.port=/dev/ttyACM0 \ # <- Use your port --robot.id=my_blue_follower_arm \ # <- Use your robot id --robot.cameras="{ front: {type: opencv, index_or_path: 8, width: 640, height: 480, fps: 30}}" \ # <- Use your cameras --dataset.single_task="Grasp a lego block and put it in the bin." \ # <- Use the same task description you used in your dataset recording --dataset.repo_id=HF_USER/dataset_name \ # <- This will be the dataset name on HF Hub --dataset.episode_time_s=50 \ --dataset.num_episodes=10 \ --policy.path=arrow-hf/smolvla-robotwin-stack-bowls-two-50ep-multi - Notebooks
- Google Colab
- Kaggle
SmolVLA RoboTwin stack_bowls_two (50 ep, MULTI-instruction)
SmolVLA policy fine-tuned on 50 demonstration episodes of the stack_bowls_two task from RoboTwin 2.0 (demo_clean config), with per-episode random language instructions sampled from RoboTwin's 100 instruction variations (seed=42 for reproducibility).
This is the multi-instruction counterpart to arrow-hf/smolvla-robotwin-stack-bowls-two-50ep (which uses a single fixed instruction).
Task
- Robot: Agilex dual-arm, end-effector control (16D state, 16D action)
- Cameras: 3 RGB streams —
dual_cam_global,cam_wrist_65,cam_wrist_75(240×320, D435) - Control rate: ~30 Hz (LeRobot metadata is 10 Hz; underlying RoboTwin sim ~30 Hz, used consistently for train/eval)
- Instructions: 50 unique sentences (one per episode), examples:
- "Use the left arm to place the object into the basket"
- "Pick the item up and drop it into the woven basket"
- "Move the object from the table into the basket"
Training
| Config | Value |
|---|---|
| Base checkpoint | lerobot/smolvla_robotwin |
| Training data | 50 RoboTwin demonstrations, 50 unique instructions |
| Batch size | 32 |
| Steps | 6000 (~10-25 epochs) |
| Optimizer | AdamW, lr=1e-4 |
| Scheduler | Cosine, warmup=300, decay=6000 |
| Chunk size | 50 |
Evaluation: Single vs Multi-Instruction Comparison
Evaluated in RoboTwin 2.0 simulator (demo_clean config), 10 episodes, max_steps=400, action_chunk_exec=50, single fixed eval instruction "stack the bowls" (fair comparison).
| Variant | Eval setting | Success rate |
|---|---|---|
| Single-instruction training | Fixed "stack the bowls" |
7/10 (70%) |
| Multi-instruction training (this model) | Fixed "stack the bowls" |
7/10 (70%) |
The multi-instruction model trades some single-instruction performance for the ability to follow varied language commands. For tasks where instruction diversity helps (held-out new instructions), this trade-off may pay off.
Usage
from lerobot.policies.smolvla import SmolVLAPolicy
policy = SmolVLAPolicy.from_pretrained("arrow-hf/smolvla-robotwin-stack-bowls-two-50ep-multi")
See LeRobot documentation for inference setup.
Citation
Built on SmolVLA and SmolVLA-RoboTwin pretrained base, fine-tuned on data collected from RoboTwin 2.0.
- Downloads last month
- 16