Update README.md
Browse files
README.md
CHANGED
|
@@ -14,4 +14,69 @@ tags:
|
|
| 14 |
- VLA
|
| 15 |
- SmolVLA
|
| 16 |
- PhysicalAI
|
| 17 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 14 |
- VLA
|
| 15 |
- SmolVLA
|
| 16 |
- PhysicalAI
|
| 17 |
+
---
|
| 18 |
+
|
| 19 |
+
### SmolVLA Fine-Tuned on for Food Stacking
|
| 20 |
+
|
| 21 |
+
**Summary**: This is a fine-tuned version of `lerobot/smolvla_base` for stacking food objects (e.g., burgers, sandwiches). It was fine-tuned on the `GetSoloTech/FoodStack` dataset using the LeRobot framework.
|
| 22 |
+
|
| 23 |
+
### Model details
|
| 24 |
+
- **Base model**: `lerobot/smolvla_base`
|
| 25 |
+
- **Task**: Vision-Language-Action control for manipulation (stacking)
|
| 26 |
+
- **Domain**: Food item stacking (burger, sandwich, etc.)
|
| 27 |
+
- **Params**: ~450M (SmolVLA)
|
| 28 |
+
- **Library**: LeRobot (`lerobot`)
|
| 29 |
+
|
| 30 |
+
### Quick start
|
| 31 |
+
Install LeRobot with SmolVLA extras:
|
| 32 |
+
|
| 33 |
+
```bash
|
| 34 |
+
git clone https://github.com/huggingface/lerobot.git
|
| 35 |
+
cd lerobot
|
| 36 |
+
pip install -e ".[smolvla]"
|
| 37 |
+
```
|
| 38 |
+
|
| 39 |
+
Load the policy from this repo and run inference:
|
| 40 |
+
|
| 41 |
+
```python
|
| 42 |
+
from lerobot.common.policies.smolvla.modeling_smolvla import SmolVLAPolicy
|
| 43 |
+
|
| 44 |
+
# Replace with your actual model ID on the Hub
|
| 45 |
+
model_id = "GetSoloTech/SmolVLA-FoodStack"
|
| 46 |
+
|
| 47 |
+
policy = SmolVLAPolicy.from_pretrained(model_id)
|
| 48 |
+
|
| 49 |
+
# Example placeholders for observation and instruction
|
| 50 |
+
observation = {
|
| 51 |
+
"image": ... , # BGR/RGB frame or processed observation per your setup
|
| 52 |
+
"state": ... , # optional proprio/scene state if used
|
| 53 |
+
}
|
| 54 |
+
instruction = "Stack the burger: bun, patty, cheese, lettuce, bun."
|
| 55 |
+
|
| 56 |
+
# Depending on your pipeline, you may wrap this in your control loop
|
| 57 |
+
actions = policy(observation, instruction)
|
| 58 |
+
|
| 59 |
+
# Send actions to your robot controller
|
| 60 |
+
# send_actions_to_robot(actions)
|
| 61 |
+
```
|
| 62 |
+
|
| 63 |
+
For end-to-end examples (policy loops, camera/robot IO), see the LeRobot docs and examples.
|
| 64 |
+
|
| 65 |
+
|
| 66 |
+
Notes:
|
| 67 |
+
- Tune batch size/steps and augmentation to your hardware and dataset split.
|
| 68 |
+
- Ensure your observation preprocessing at train-time matches inference.
|
| 69 |
+
|
| 70 |
+
|
| 71 |
+
### Limitations
|
| 72 |
+
- Specializes in food stacking; may not generalize to unseen objects/layouts.
|
| 73 |
+
- Sensitive to perception domain shift (lighting, textures, camera intrinsics).
|
| 74 |
+
- Requires correct observation normalization consistent with training.
|
| 75 |
+
|
| 76 |
+
### Dataset
|
| 77 |
+
- **Training data**: `GetSoloTech/FoodStack`
|
| 78 |
+
|
| 79 |
+
### Resources and references
|
| 80 |
+
- SmolVLA base: `https://huggingface.co/lerobot/smolvla_base`
|
| 81 |
+
- SmolVLA overview: `https://smolvla.net/index_en.html`
|
| 82 |
+
- LeRobot: `https://github.com/huggingface/lerobot`
|