hunarbatra
/

SpatialThinker-30B

Image-Text-to-Text

spatial-reasoning

vision-language

reinforcement-learning

mixture-of-experts

Model card Files Files and versions

hunarbatra commited on May 22

Commit

ec805d4

·

verified ·

1 Parent(s): 6a2ec11

Update README.md

Files changed (1) hide show

README.md +0 -1

README.md CHANGED Viewed

@@ -104,7 +104,6 @@ print(output)
 - **Batch size**: 16 prompts × 8 rollouts = 128 generations/step
 - **Optimizer**: AdamW, lr=1e-6, KL coefficient=1e-2 (low_var_kl)
 - **LoRA**: rank=64 on the language tower
-- **Total cost**: ~$27 on Tinker
 The model was trained with several rollout-side fixes that lift the Qwen3-VL-Instruct base's format-pass rate from ~78% to ~96% during training:
 - Forced `<observe>\n` assistant prefix (matches the four-tag schema the model is trained to produce)

 - **Batch size**: 16 prompts × 8 rollouts = 128 generations/step
 - **Optimizer**: AdamW, lr=1e-6, KL coefficient=1e-2 (low_var_kl)
 - **LoRA**: rank=64 on the language tower
 The model was trained with several rollout-side fixes that lift the Qwen3-VL-Instruct base's format-pass rate from ~78% to ~96% during training:
 - Forced `<observe>\n` assistant prefix (matches the four-tag schema the model is trained to produce)