File size: 2,365 Bytes
1864e85
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a23b17a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
---
datasets:
- GetSoloTech/FoodStack
language:
- en
base_model:
- lerobot/smolvla_base
library_name: transformers
tags:
- Robotics
- Lerobot
- Food
- PickPlace
- VLA
- SmolVLA
- PhysicalAI
---

### SmolVLA Fine-Tuned on for Food Stacking

**Summary**: This is a fine-tuned version of `lerobot/smolvla_base` for stacking food objects (e.g., burgers, sandwiches). It was fine-tuned on the `GetSoloTech/FoodStack` dataset using the LeRobot framework.

### Model details
- **Base model**: `lerobot/smolvla_base`
- **Task**: Vision-Language-Action control for manipulation (stacking)
- **Domain**: Food item stacking (burger, sandwich, etc.)
- **Params**: ~450M (SmolVLA)
- **Library**: LeRobot (`lerobot`)

### Quick start
Install LeRobot with SmolVLA extras:

```bash
git clone https://github.com/huggingface/lerobot.git
cd lerobot
pip install -e ".[smolvla]"
```

Load the policy from this repo and run inference:

```python
from lerobot.common.policies.smolvla.modeling_smolvla import SmolVLAPolicy

# Replace with your actual model ID on the Hub
model_id = "GetSoloTech/SmolVLA-FoodStack"

policy = SmolVLAPolicy.from_pretrained(model_id)

# Example placeholders for observation and instruction
observation = {
    "image": ... ,  # BGR/RGB frame or processed observation per your setup
    "state": ... ,  # optional proprio/scene state if used
}
instruction = "Stack the burger: bun, patty, cheese, lettuce, bun."

# Depending on your pipeline, you may wrap this in your control loop
actions = policy(observation, instruction)

# Send actions to your robot controller
# send_actions_to_robot(actions)
```

For end-to-end examples (policy loops, camera/robot IO), see the LeRobot docs and examples.


Notes:
- Tune batch size/steps and augmentation to your hardware and dataset split.
- Ensure your observation preprocessing at train-time matches inference.


### Limitations
- Specializes in food stacking; may not generalize to unseen objects/layouts.
- Sensitive to perception domain shift (lighting, textures, camera intrinsics).
- Requires correct observation normalization consistent with training.

### Dataset
- **Training data**: `GetSoloTech/FoodStack` 

### Resources and references
- SmolVLA base: `https://huggingface.co/lerobot/smolvla_base`
- SmolVLA overview: `https://smolvla.net/index_en.html`
- LeRobot: `https://github.com/huggingface/lerobot`