Checkpoint Manifest

Checkpoints selected by matching parse_results.py ROUGE-L outputs (from eval_results/*/eval.log) against the reported results table. Only the MultiLevelOT and w/ MTA rows are produced by this repo; DSKDv2 / DWA-KD / ResidualKD live in other repos and are not included here.

Format note: smaller students (GPT-2 120M/340M) are saved as full fine-tunes (config.json + model.safetensors); larger pairs (GPT-2 1.5B, OPT-2.7B, TinyLLaMA-1.1B) are saved as LoRA adapters (adapter_config.json + adapter_model.safetensors).

Included checkpoints

ckpts path	Method	Source (output/)	Avg ROUGE-L	Dolly / SelfInst / Vicuna / S-NI
qwen1.5-1.8B-to-gpt2-120M/multilevelot	MultiLevelOT	paper/12861	18.67	23.92 / 13.04 / 15.28 / 22.43
qwen1.5-1.8B-to-gpt2-120M/multilevelot_mta	MultiLevelOT w/ MTA (= ablation Full-level)	mta/5716	18.92	24.37 / 12.97 / 15.47 / 22.86
qwen1.5-1.8B-to-gpt2-120M/ablation_word	ablation w/ Word-level	mta_all_word/14290	18.97	24.48 / 13.34 / 15.84 / 22.22
qwen1.5-1.8B-to-gpt2-340M/multilevelot	MultiLevelOT	paper/12861	19.26	25.53 / 13.23 / 15.72 / 22.56
qwen2.5-7B-to-gpt2-1.5B/multilevelot	MultiLevelOT	paper/28580	21.74	26.24 / 17.31 / 17.28 / 26.14
qwen2.5-7B-to-opt-2.7B/multilevelot	MultiLevelOT	paper/22864	23.19	28.30 / 17.81 / 17.28 / 29.36
qwen2.5-7B-to-opt-2.7B/multilevelot_mta	MultiLevelOT w/ MTA	mta/28580	23.27	28.39 / 18.44 / 17.68 / 28.58
mistral-7B-to-tinyllama-1.1B/multilevelot	MultiLevelOT	paper/28580	21.29	26.41 / 15.16 / 16.94 / 26.66

Missing (reported in table but no checkpoint found in output/)

qwen1.5-1.8B-to-gpt2-340M — w/ MTA (Avg 19.68)
qwen2.5-7B-to-gpt2-1.5B — w/ MTA (Avg 21.98)
mistral-7B-to-tinyllama-1.1B — w/ MTA (Avg 22.85)
qwen1.5-1.8B-to-gpt2-120M — ablation w/ Phrase-level (Avg 18.70): eval.log exists (mta_all_phrase/) but no checkpoint in output/.