| --- |
| license: apache-2.0 |
| base_model: Qwen/Qwen3.5-2B |
| tags: |
| - gui-grounding |
| - sdft |
| - self-distillation |
| - qwen3.5 |
| --- |
| |
| # WebOS SDFT 2B |
|
|
| Full-parameter SDFT (Self-Distillation Fine-Tuning, paper-faithful Shenfeld et al. 2601.19897 Algorithm 1) of Qwen3.5-2B on the WebOS GUI-grounding dataset. |
|
|
| ## Layout |
|
|
| | Subfolder | Description | Click-acc on full test set | |
| |---|---|---| |
| | `checkpoint-200/` | Step 200 / 616 (32% of training) | 18.41% | |
| | `checkpoint-400/` | Step 400 / 616 (65%) | 18.75% | |
| | `checkpoint-600/` | Step 600 / 616 (97%) | **19.95%** | |
| | `final/` | Step 616 / 616 (100%) | ~20% | |
|
|
| For comparison: base Qwen3.5-2B on the same eval is 18.84%, and Qwen3.5-2B + privileged GT-demo teacher prompt is 26.46%. |
|
|
| ## Training config (16b_train_sdft_full_ft.py) |
| - bf16, full-param FT (vision frozen) |
| - bs=1, ga=32 (effective batch 32) |
| - lr=5e-6, warmup=10 steps, cosine |
| - AdamW, wd=0, max_grad_norm=1.0 |
| - ema-α=0.01, kl-temperature=2.0, reverse KL |
| - 2 epochs × 9,864 train samples = 616 optimizer steps |
| - on-policy max_new_tokens=96, temperature=1.0 |
|
|
| ## Loading |
|
|
| ```python |
| from transformers import AutoModelForImageTextToText, AutoProcessor |
| model = AutoModelForImageTextToText.from_pretrained( |
| "Chengheng/webos-sdft-2b", subfolder="checkpoint-600", |
| torch_dtype="bfloat16", device_map="cuda:0", |
| ) |
| processor = AutoProcessor.from_pretrained( |
| "Chengheng/webos-sdft-2b", subfolder="checkpoint-600", |
| ) |
| ``` |
|
|
| ## Code |
|
|
| Training and eval scripts: https://github.com/ChenghengLi/WebOS |
|
|