best_ckpt/

Current best single-layer L10 retriever (P110, May 2026) — committed to git for direct use by collaborators.

Source experiment: P110

experiments/expP_110_R10_lr1e4_wt_combfull_l10/ckpts/
Config: Round 10 (lr=1e-4, weighted-loss, combined_full data), single layer L10
Best metric (val full-set L10):
- precision=0.7171, recall=0.5357, val_f1=0.6133, recall@512=0.5256

File	Type	Format
`l10.pt`	real file (~83 MB, committed)	Single-layer state_dict (latest "best" snapshot, recommended for inference)
`l10_best_f1.pt`	local symlink → `experiments/...` (untracked)	Single-layer state_dict (best F1 epoch)
`l10_best_recall_k.pt`	local symlink → `experiments/...` (untracked)	Single-layer state_dict (best recall@K epoch)

l10.pt is the only file committed to git. The other two are convenience symlinks for local experimentation; pull experiments/ separately if you need them.

All ckpts are bare-key single-layer format:

wq_a.weight       — [1024, 4096]
wq_b.weight       — [N_HEADS*128, 1024]   (N_HEADS=64)
q_norm_weight     — [1024]
weights_proj.weight — [N_HEADS, 4096]
freqs_cis         — RoPE precomputed (optional)

Usage

# Single-layer inference (default path):
python inference.py --ckpt best_ckpt/l10.pt --layer 10 --data-path ./data/doc_00030.pkl

inference.py returns raw logits (not sigmoid'd). For 0-1 probabilities call torch.sigmoid(logits) externally; for top-K selection use logits directly (sigmoid is monotonic and unnecessary).

Updating

When a better single-layer L10 ckpt emerges, replace l10.pt with the new real file (and re-commit):

cp experiments/expP_NEW/ckpts/ckpt_best.pt best_ckpt/l10.pt
git add best_ckpt/l10.pt && git commit -m "Update best_ckpt/l10.pt to expP_NEW"

For local-only convenience symlinks:

ln -sfn ../experiments/expP_NEW/ckpts/ckpt_best_f1.pt     best_ckpt/l10_best_f1.pt
ln -sfn ../experiments/expP_NEW/ckpts/ckpt_best_recall_k.pt best_ckpt/l10_best_recall_k.pt

Previous Joint Format (R601, archived)

Earlier this folder linked to the R601 joint chain ckpt (Pair → PW noweight, val F1=0.7927). That ckpt has the multi-layer format (retrievers.l{10,12,20}.* keys) and lives at:

experiments/expR_601_stage2_pw_from_R462_ddp/ckpts/ckpt_joint_best_ens_f1.pt

⚠️ R-series joint ckpts have a logit-偏负 issue (sigmoid > 0.5 hit rate ~0.13% on test data, vs P110's ~1.0%) — they have great recall@K but cannot use sigmoid threshold 0.5 directly. For deployment, prefer P110 (l10.pt) unless you specifically need the joint 3-layer format.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support