glm-52-debug / README.md
pstefa-baseten's picture
Update debug model with FlashMLA-compatible DSA dims
1a4b1e4 verified
|
Raw
History Blame Contribute Delete
580 Bytes
metadata
license: other
tags:
  - glm
  - debug
  - synthetic

GLM-5.2 Debug

Tiny randomly initialized GLM-5.2-shaped glm_moe_dsa model for fast trainer/kernel iteration.

Preserves the fused DSA dimensions needed by FlashMLA (kv_lora_rank=512, qk_rope_head_dim=64, v_head_dim=512), plus 64 attention heads, DSA IndexShare, MoE/shared experts, and model_type=glm_moe_dsa.

It intentionally uses 8 layers, tiny hidden/MLP/MoE sizes, vocab_size=2048, and random weights. Use synthetic token IDs in [0, 2047]; this is not intended for natural-language quality or sampling.