Qwen3-VL-4B Robotics Subtask Prediction (v0)

Fine-tuned Qwen/Qwen3-VL-4B-Instruct for embodied robotics next-subtask prediction.

Task

Given an observation image from a robot workspace and a high-level task instruction, predict the specific subtask the robot should perform next.

Input: Image + "Task: Put the radish in the yellow plate\nCompleted subtasks: approach the radish\nWhat specific subtask should the robot perform next?"

Output: "grasp the radish"

Data

Source images: lucanunz/alldata_14tasks (492 episodes, 14 tasks, LeRobot format)
Annotations: shivakanthsujit/alldata14_annotations (stage06: subtask decomposition, stage07: steering commands)
~6,352 training samples (3 frames per subtask range, ~492 annotated episodes)
Images are 256×256 RGB from the main camera

Training

Setting	Value
Base model	Qwen/Qwen3-VL-4B-Instruct
Method	SFT with LoRA (r=32, alpha=16)
Epochs	3
Effective batch	16 (2 × 8 grad accum)
Learning rate	2e-4 (cosine, 5% warmup)
Precision	bf16
Optimizations	LIGER kernel, fused AdamW, gradient checkpointing
Hardware	A10G (24GB VRAM) or A100

Launch Training

pip install trl transformers datasets peft accelerate bitsandbytes torch torchvision \
            trackio huggingface_hub av qwen-vl-utils Pillow liger-kernel

# Login to HF Hub
huggingface-cli login

# Run training
python train_vlm_subtask.py

Or via HF Jobs:

from huggingface_hub import HfApi
api = HfApi()
# Submit as a job on A10G hardware

Versions

v0: Direct subtask prediction (this version) — no reasoning traces
v1 (planned): <think>reasoning</think><answer>subtask</answer> format using stage08 rationales

Architecture Notes

Uses Qwen3-VL (Oct 2025) which has explicit 3D grounding and spatial reasoning capabilities — ideal for embodied robotics
LoRA targets: q/k/v/o projection + gate/up/down MLP layers
System prompt frames the task as embodied robot assistant

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support