CoVT-Phase2-3expert-Full (post-cleanup)
Status (2026-06-29): This repo previously hosted the 3-expert CoVT Phase2 LoRA from our non-strict reproduction run. The non-strict weights were trained with a real bug β
model.global_stepswas not restored on resume, leaving anchor losses ON for ~1000 extra opt-steps β and are not paper-faithful. They have been removed.
What this repo contains now
| Path | Description |
|---|---|
stage1_merged/ |
Phase1 LoRA merged into Qwen2.5-VL-7B base. Clean artifact built from the (clean) Stage1 6K-step LoRA. Safe to use as the starting point for any Phase2 reproduction. |
scripts/ |
Training / merge / eval scripts used in the reproduction run. |
training_src/ |
CoVT modelling and trainer source (Qwen2.5-VL fork). |
runs/ |
Tensorboard event files for the original non-strict run. Loss curves remain useful for diagnostics even though the final weights were buggy. |
phase2.log, sft_phase2.log |
Stdout/stderr from the original Phase2 run. |
What was deleted (2026-06-29)
The entire archive_nonstrict/ subtree (42 files, ~25 GB) was deleted in commit 974103e:
archive_nonstrict/adapter_*β final non-strict adapter (buggy)archive_nonstrict/non_lora_state_dict.bin,trainer_state.json,config.jsonβ final non-strict state (buggy)archive_nonstrict/checkpoint-9000/andarchive_nonstrict/checkpoint-10000/β intermediate non-strict checkpoints (buggy)
Reason: the bug above means anchor losses were active far past their intended schedule. The weights drift from the paper-described training recipe and should not be used or compared against the paper's numbers.
Where to get paper-faithful 3-expert Phase2 weights
Use Steven668866/CoVT-Phase2-3expert-Strict β trained after the global_steps fix.
Companion repos
- Steven668866/CoVT-3expert-Stage1-LoRA β Phase1 6K-step LoRA adapter (clean; consumed by
stage1_merged/above). - Steven668866/CoVT-Phase2-3expert-Strict β Canonical strict 3-expert Phase2 artifacts.
Caveat
This is a 3-expert variant. The CoVT paper's main configuration uses 4 experts. Numbers from these weights should not be compared 1-to-1 with the paper's headline table.