CoVT-Phase2-3expert-Full (post-cleanup)

Status (2026-06-29): This repo previously hosted the 3-expert CoVT Phase2 LoRA from our non-strict reproduction run. The non-strict weights were trained with a real bug — model.global_steps was not restored on resume, leaving anchor losses ON for ~1000 extra opt-steps — and are not paper-faithful. They have been removed.

What this repo contains now

Path	Description
`stage1_merged/`	Phase1 LoRA merged into Qwen2.5-VL-7B base. Clean artifact built from the (clean) Stage1 6K-step LoRA. Safe to use as the starting point for any Phase2 reproduction.
`scripts/`	Training / merge / eval scripts used in the reproduction run.
`training_src/`	CoVT modelling and trainer source (Qwen2.5-VL fork).
`runs/`	Tensorboard event files for the original non-strict run. Loss curves remain useful for diagnostics even though the final weights were buggy.
`phase2.log`, `sft_phase2.log`	Stdout/stderr from the original Phase2 run.

What was deleted (2026-06-29)

The entire archive_nonstrict/ subtree (42 files, ~25 GB) was deleted in commit 974103e:

archive_nonstrict/adapter_* — final non-strict adapter (buggy)
archive_nonstrict/non_lora_state_dict.bin, trainer_state.json, config.json — final non-strict state (buggy)
archive_nonstrict/checkpoint-9000/ and archive_nonstrict/checkpoint-10000/ — intermediate non-strict checkpoints (buggy)

Reason: the bug above means anchor losses were active far past their intended schedule. The weights drift from the paper-described training recipe and should not be used or compared against the paper's numbers.

Where to get paper-faithful 3-expert Phase2 weights

Use Steven668866/CoVT-Phase2-3expert-Strict — trained after the global_steps fix.

Companion repos

Steven668866/CoVT-3expert-Stage1-LoRA — Phase1 6K-step LoRA adapter (clean; consumed by stage1_merged/ above).
Steven668866/CoVT-Phase2-3expert-Strict — Canonical strict 3-expert Phase2 artifacts.

Caveat

This is a 3-expert variant. The CoVT paper's main configuration uses 4 experts. Numbers from these weights should not be compared 1-to-1 with the paper's headline table.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support