zeeshaan-ai commited on
Commit
a23b17a
·
verified ·
1 Parent(s): 1864e85

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +66 -1
README.md CHANGED
@@ -14,4 +14,69 @@ tags:
14
  - VLA
15
  - SmolVLA
16
  - PhysicalAI
17
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  - VLA
15
  - SmolVLA
16
  - PhysicalAI
17
+ ---
18
+
19
+ ### SmolVLA Fine-Tuned on for Food Stacking
20
+
21
+ **Summary**: This is a fine-tuned version of `lerobot/smolvla_base` for stacking food objects (e.g., burgers, sandwiches). It was fine-tuned on the `GetSoloTech/FoodStack` dataset using the LeRobot framework.
22
+
23
+ ### Model details
24
+ - **Base model**: `lerobot/smolvla_base`
25
+ - **Task**: Vision-Language-Action control for manipulation (stacking)
26
+ - **Domain**: Food item stacking (burger, sandwich, etc.)
27
+ - **Params**: ~450M (SmolVLA)
28
+ - **Library**: LeRobot (`lerobot`)
29
+
30
+ ### Quick start
31
+ Install LeRobot with SmolVLA extras:
32
+
33
+ ```bash
34
+ git clone https://github.com/huggingface/lerobot.git
35
+ cd lerobot
36
+ pip install -e ".[smolvla]"
37
+ ```
38
+
39
+ Load the policy from this repo and run inference:
40
+
41
+ ```python
42
+ from lerobot.common.policies.smolvla.modeling_smolvla import SmolVLAPolicy
43
+
44
+ # Replace with your actual model ID on the Hub
45
+ model_id = "GetSoloTech/SmolVLA-FoodStack"
46
+
47
+ policy = SmolVLAPolicy.from_pretrained(model_id)
48
+
49
+ # Example placeholders for observation and instruction
50
+ observation = {
51
+ "image": ... , # BGR/RGB frame or processed observation per your setup
52
+ "state": ... , # optional proprio/scene state if used
53
+ }
54
+ instruction = "Stack the burger: bun, patty, cheese, lettuce, bun."
55
+
56
+ # Depending on your pipeline, you may wrap this in your control loop
57
+ actions = policy(observation, instruction)
58
+
59
+ # Send actions to your robot controller
60
+ # send_actions_to_robot(actions)
61
+ ```
62
+
63
+ For end-to-end examples (policy loops, camera/robot IO), see the LeRobot docs and examples.
64
+
65
+
66
+ Notes:
67
+ - Tune batch size/steps and augmentation to your hardware and dataset split.
68
+ - Ensure your observation preprocessing at train-time matches inference.
69
+
70
+
71
+ ### Limitations
72
+ - Specializes in food stacking; may not generalize to unseen objects/layouts.
73
+ - Sensitive to perception domain shift (lighting, textures, camera intrinsics).
74
+ - Requires correct observation normalization consistent with training.
75
+
76
+ ### Dataset
77
+ - **Training data**: `GetSoloTech/FoodStack`
78
+
79
+ ### Resources and references
80
+ - SmolVLA base: `https://huggingface.co/lerobot/smolvla_base`
81
+ - SmolVLA overview: `https://smolvla.net/index_en.html`
82
+ - LeRobot: `https://github.com/huggingface/lerobot`