YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
H3 Init Ablation — Qwen3-0.6B + 4-level Semantic IDs
Artifacts from a 4-arm × 3-seed init ablation study on Qwen3-0.6B with 1027 new Semantic ID tokens (4-level hierarchical, RQ-VAE codebook 256^4).
Arms
| Arm | Init strategy |
|---|---|
| A | Gaussian covariance-preserving (fit N(μ, Σ) on old embeddings) |
| B | Variance-only control (diag Σ, no cross-dim covariance) |
| C | Title-mean (cluster items by hier-path, use cluster-mean title embeddings) |
| D | RQ-VAE codebook vectors (trained on title embeddings) |
Seeds: 42, 43, 44.
Layout
Each subfolder is self-contained and loadable via .
Usage
Training
- Base: Qwen/Qwen3-0.6B
- Stage 1: vocab expansion + target-only init (2000 steps, frozen body)
- Stage 2: full fine-tune on conversational SID tasks (3152 steps, packing seq_len=512)
- Optimizer: adamw_8bit (Unsloth), lr=2e-5, cosine decay
- Data: Amazon Pet Supplies conversations (title, description, features, copurchase, sequences)
License
Apache-2.0 (follows Qwen3-0.6B base license).
Citation
TBD — MIPT master's thesis, 2026.
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support