GetSoloTech
/

SmolVLA-FoodStack

Model card Files Files and versions

zeeshaan-ai commited on Sep 26, 2025

Commit

a23b17a

·

verified ·

1 Parent(s): 1864e85

Update README.md

Files changed (1) hide show

README.md +66 -1

README.md CHANGED Viewed

@@ -14,4 +14,69 @@ tags:
 - VLA
 - SmolVLA
 - PhysicalAI
----

 - VLA
 - SmolVLA
 - PhysicalAI
+---
+### SmolVLA Fine-Tuned on for Food Stacking
+**Summary**: This is a fine-tuned version of `lerobot/smolvla_base` for stacking food objects (e.g., burgers, sandwiches). It was fine-tuned on the `GetSoloTech/FoodStack` dataset using the LeRobot framework.
+### Model details
+- **Base model**: `lerobot/smolvla_base`
+- **Task**: Vision-Language-Action control for manipulation (stacking)
+- **Domain**: Food item stacking (burger, sandwich, etc.)
+- **Params**: ~450M (SmolVLA)
+- **Library**: LeRobot (`lerobot`)
+### Quick start
+Install LeRobot with SmolVLA extras:
+```bash
+git clone https://github.com/huggingface/lerobot.git
+cd lerobot
+pip install -e ".[smolvla]"
+```
+Load the policy from this repo and run inference:
+```python
+from lerobot.common.policies.smolvla.modeling_smolvla import SmolVLAPolicy
+# Replace with your actual model ID on the Hub
+model_id = "GetSoloTech/SmolVLA-FoodStack"
+policy = SmolVLAPolicy.from_pretrained(model_id)
+# Example placeholders for observation and instruction
+observation = {
+    "image": ... ,  # BGR/RGB frame or processed observation per your setup
+    "state": ... ,  # optional proprio/scene state if used
+}
+instruction = "Stack the burger: bun, patty, cheese, lettuce, bun."
+# Depending on your pipeline, you may wrap this in your control loop
+actions = policy(observation, instruction)
+# Send actions to your robot controller
+# send_actions_to_robot(actions)
+```
+For end-to-end examples (policy loops, camera/robot IO), see the LeRobot docs and examples.
+Notes:
+- Tune batch size/steps and augmentation to your hardware and dataset split.
+- Ensure your observation preprocessing at train-time matches inference.
+### Limitations
+- Specializes in food stacking; may not generalize to unseen objects/layouts.
+- Sensitive to perception domain shift (lighting, textures, camera intrinsics).
+- Requires correct observation normalization consistent with training.
+### Dataset
+- **Training data**: `GetSoloTech/FoodStack`
+### Resources and references
+- SmolVLA base: `https://huggingface.co/lerobot/smolvla_base`
+- SmolVLA overview: `https://smolvla.net/index_en.html`
+- LeRobot: `https://github.com/huggingface/lerobot`