---
license: apache-2.0
base_model: Qwen/Qwen3.5-2B
tags:
  - gui-grounding
  - sdft
  - self-distillation
  - qwen3.5
---

# WebOS SDFT 2B

Full-parameter SDFT (Self-Distillation Fine-Tuning, paper-faithful Shenfeld et al. 2601.19897 Algorithm 1) of Qwen3.5-2B on the WebOS GUI-grounding dataset.

## Layout

| Subfolder | Description | Click-acc on full test set |
|---|---|---|
| `checkpoint-200/` | Step 200 / 616 (32% of training) | 18.41% |
| `checkpoint-400/` | Step 400 / 616 (65%) | 18.75% |
| `checkpoint-600/` | Step 600 / 616 (97%) | **19.95%** |
| `final/` | Step 616 / 616 (100%) | ~20% |

For comparison: base Qwen3.5-2B on the same eval is 18.84%, and Qwen3.5-2B + privileged GT-demo teacher prompt is 26.46%.

## Training config (16b_train_sdft_full_ft.py)
- bf16, full-param FT (vision frozen)
- bs=1, ga=32 (effective batch 32)
- lr=5e-6, warmup=10 steps, cosine
- AdamW, wd=0, max_grad_norm=1.0
- ema-α=0.01, kl-temperature=2.0, reverse KL
- 2 epochs × 9,864 train samples = 616 optimizer steps
- on-policy max_new_tokens=96, temperature=1.0

## Loading

```python
from transformers import AutoModelForImageTextToText, AutoProcessor
model = AutoModelForImageTextToText.from_pretrained(
    "Chengheng/webos-sdft-2b", subfolder="checkpoint-600",
    torch_dtype="bfloat16", device_map="cuda:0",
)
processor = AutoProcessor.from_pretrained(
    "Chengheng/webos-sdft-2b", subfolder="checkpoint-600",
)
```

## Code

Training and eval scripts: https://github.com/ChenghengLi/WebOS