SmolVLA-SewerBot-v0 (Phase 0 Stub)
First Vision Language Action (VLA) model for confined spaces, such as cleaning sewer tanks in India.
This is NOT a production model. Work in progress, more details coming soon.
Architecture
- Base: Stub 4-layer MLP (state -> action), NOT SmolVLA
- Input: 18-dim proprioceptive state (joint_pos + joint_vel + ee_pos + ee_quat + base_pos)
- Output: 7-DOF action
- Parameters: ~105K total
- Training: 20 steps, MSE loss, Adam optimizer
Intended Use
Pipeline validation only. Do not deploy on a real robot.
Source
- GitHub: saralsystems/sewer-vla
- Dataset: sewer-vla-mujoco-v0
- License: Apache 2.0