V2: continued KD fine-tune at seq_len 1024, 500 steps, lr 1e-4 from V1 (alpaca-cleaned) 683b9c3 verified ELutris commited on 3 days ago
KD-distilled 2-layer student against Llama-2-7B teacher (alpaca-cleaned, 1500 steps, T=2.0, KL loss) f0597fe verified ELutris commited on 3 days ago
Keep original layers [0, 31] of NousResearch/Llama-2-7b-hf 054af56 verified ELutris commited on 4 days ago
Add first 2-layer slice of NousResearch/Llama-2-7b-hf f256511 verified ELutris commited on 4 days ago