YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

H3 Init Ablation — Qwen3-0.6B + 4-level Semantic IDs

Artifacts from a 4-arm × 3-seed init ablation study on Qwen3-0.6B with 1027 new Semantic ID tokens (4-level hierarchical, RQ-VAE codebook 256^4).

Arms

Arm Init strategy
A Gaussian covariance-preserving (fit N(μ, Σ) on old embeddings)
B Variance-only control (diag Σ, no cross-dim covariance)
C Title-mean (cluster items by hier-path, use cluster-mean title embeddings)
D RQ-VAE codebook vectors (trained on title embeddings)

Seeds: 42, 43, 44.

Layout

Each subfolder is self-contained and loadable via .

Usage

Training

  • Base: Qwen/Qwen3-0.6B
  • Stage 1: vocab expansion + target-only init (2000 steps, frozen body)
  • Stage 2: full fine-tune on conversational SID tasks (3152 steps, packing seq_len=512)
  • Optimizer: adamw_8bit (Unsloth), lr=2e-5, cosine decay
  • Data: Amazon Pet Supplies conversations (title, description, features, copurchase, sequences)

License

Apache-2.0 (follows Qwen3-0.6B base license).

Citation

TBD — MIPT master's thesis, 2026.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support