Checkpoint Manifest
Checkpoints selected by matching parse_results.py ROUGE-L outputs (from
eval_results/*/eval.log) against the reported results table. Only the
MultiLevelOT and w/ MTA rows are produced by this repo; DSKDv2 / DWA-KD /
ResidualKD live in other repos and are not included here.
Format note: smaller students (GPT-2 120M/340M) are saved as full fine-tunes
(config.json + model.safetensors); larger pairs (GPT-2 1.5B, OPT-2.7B,
TinyLLaMA-1.1B) are saved as LoRA adapters (adapter_config.json +
adapter_model.safetensors).
Included checkpoints
| ckpts path | Method | Source (output/) | Avg ROUGE-L | Dolly / SelfInst / Vicuna / S-NI |
|---|---|---|---|---|
| qwen1.5-1.8B-to-gpt2-120M/multilevelot | MultiLevelOT | paper/12861 | 18.67 | 23.92 / 13.04 / 15.28 / 22.43 |
| qwen1.5-1.8B-to-gpt2-120M/multilevelot_mta | MultiLevelOT w/ MTA (= ablation Full-level) | mta/5716 | 18.92 | 24.37 / 12.97 / 15.47 / 22.86 |
| qwen1.5-1.8B-to-gpt2-120M/ablation_word | ablation w/ Word-level | mta_all_word/14290 | 18.97 | 24.48 / 13.34 / 15.84 / 22.22 |
| qwen1.5-1.8B-to-gpt2-340M/multilevelot | MultiLevelOT | paper/12861 | 19.26 | 25.53 / 13.23 / 15.72 / 22.56 |
| qwen2.5-7B-to-gpt2-1.5B/multilevelot | MultiLevelOT | paper/28580 | 21.74 | 26.24 / 17.31 / 17.28 / 26.14 |
| qwen2.5-7B-to-opt-2.7B/multilevelot | MultiLevelOT | paper/22864 | 23.19 | 28.30 / 17.81 / 17.28 / 29.36 |
| qwen2.5-7B-to-opt-2.7B/multilevelot_mta | MultiLevelOT w/ MTA | mta/28580 | 23.27 | 28.39 / 18.44 / 17.68 / 28.58 |
| mistral-7B-to-tinyllama-1.1B/multilevelot | MultiLevelOT | paper/28580 | 21.29 | 26.41 / 15.16 / 16.94 / 26.66 |
Missing (reported in table but no checkpoint found in output/)
- qwen1.5-1.8B-to-gpt2-340M — w/ MTA (Avg 19.68)
- qwen2.5-7B-to-gpt2-1.5B — w/ MTA (Avg 21.98)
- mistral-7B-to-tinyllama-1.1B — w/ MTA (Avg 22.85)
- qwen1.5-1.8B-to-gpt2-120M — ablation w/ Phrase-level (Avg 18.70):
eval.logexists (mta_all_phrase/) but no checkpoint inoutput/.