ELutris's picture
KD-distilled 2-layer Qwen2-MoE from hyper-accel/ci-random-qwen2-moe-a3b against Qwen1.5-MoE-A2.7B teacher (alpaca-cleaned, 1500 steps, T=2.0)
ef12344 verified