Update README.md
Browse files
README.md
CHANGED
|
@@ -104,7 +104,6 @@ print(output)
|
|
| 104 |
- **Batch size**: 16 prompts × 8 rollouts = 128 generations/step
|
| 105 |
- **Optimizer**: AdamW, lr=1e-6, KL coefficient=1e-2 (low_var_kl)
|
| 106 |
- **LoRA**: rank=64 on the language tower
|
| 107 |
-
- **Total cost**: ~$27 on Tinker
|
| 108 |
|
| 109 |
The model was trained with several rollout-side fixes that lift the Qwen3-VL-Instruct base's format-pass rate from ~78% to ~96% during training:
|
| 110 |
- Forced `<observe>\n` assistant prefix (matches the four-tag schema the model is trained to produce)
|
|
|
|
| 104 |
- **Batch size**: 16 prompts × 8 rollouts = 128 generations/step
|
| 105 |
- **Optimizer**: AdamW, lr=1e-6, KL coefficient=1e-2 (low_var_kl)
|
| 106 |
- **LoRA**: rank=64 on the language tower
|
|
|
|
| 107 |
|
| 108 |
The model was trained with several rollout-side fixes that lift the Qwen3-VL-Instruct base's format-pass rate from ~78% to ~96% during training:
|
| 109 |
- Forced `<observe>\n` assistant prefix (matches the four-tag schema the model is trained to produce)
|