KD-distilled 2-layer student against Llama-2-7B teacher (alpaca-cleaned, 1500 steps, T=2.0, KL loss) f0597fe verified ELutris commited on 3 days ago
Add first 2-layer slice of NousResearch/Llama-2-7b-hf f256511 verified ELutris commited on 4 days ago