| # best_ckpt/ |
| |
| **Current best single-layer L10 retriever** (P110, May 2026) β committed to git for direct use by collaborators. |
| |
| Source experiment: **P110** |
| - `experiments/expP_110_R10_lr1e4_wt_combfull_l10/ckpts/` |
| - Config: Round 10 (lr=1e-4, weighted-loss, combined_full data), single layer L10 |
| - Best metric (val full-set L10): |
| - precision=0.7171, recall=0.5357, **val_f1=0.6133**, recall@512=0.5256 |
| |
| | File | Type | Format | |
| |------|------|--------| |
| | **`l10.pt`** | **real file (~83 MB, committed)** | Single-layer state_dict (latest "best" snapshot, recommended for inference) | |
| | `l10_best_f1.pt` | local symlink β `experiments/...` (untracked) | Single-layer state_dict (best F1 epoch) | |
| | `l10_best_recall_k.pt` | local symlink β `experiments/...` (untracked) | Single-layer state_dict (best recall@K epoch) | |
| |
| `l10.pt` is the only file committed to git. The other two are convenience symlinks for local |
| experimentation; pull `experiments/` separately if you need them. |
| |
| All ckpts are bare-key single-layer format: |
| ``` |
| wq_a.weight β [1024, 4096] |
| wq_b.weight β [N_HEADS*128, 1024] (N_HEADS=64) |
| q_norm_weight β [1024] |
| weights_proj.weight β [N_HEADS, 4096] |
| freqs_cis β RoPE precomputed (optional) |
| ``` |
| |
| ## Usage |
| |
| ```bash |
| # Single-layer inference (default path): |
| python inference.py --ckpt best_ckpt/l10.pt --layer 10 --data-path ./data/doc_00030.pkl |
| ``` |
| |
| `inference.py` returns **raw logits** (not sigmoid'd). For 0-1 probabilities call `torch.sigmoid(logits)` |
| externally; for top-K selection use logits directly (sigmoid is monotonic and unnecessary). |
| |
| ## Updating |
| |
| When a better single-layer L10 ckpt emerges, replace l10.pt with the new real file (and re-commit): |
| |
| ```bash |
| cp experiments/expP_NEW/ckpts/ckpt_best.pt best_ckpt/l10.pt |
| git add best_ckpt/l10.pt && git commit -m "Update best_ckpt/l10.pt to expP_NEW" |
| ``` |
| |
| For local-only convenience symlinks: |
| |
| ```bash |
| ln -sfn ../experiments/expP_NEW/ckpts/ckpt_best_f1.pt best_ckpt/l10_best_f1.pt |
| ln -sfn ../experiments/expP_NEW/ckpts/ckpt_best_recall_k.pt best_ckpt/l10_best_recall_k.pt |
| ``` |
| |
| ## Previous Joint Format (R601, archived) |
| |
| Earlier this folder linked to the R601 joint chain ckpt (Pair β PW noweight, val F1=0.7927). |
| That ckpt has the multi-layer format (`retrievers.l{10,12,20}.*` keys) and lives at: |
| |
| - `experiments/expR_601_stage2_pw_from_R462_ddp/ckpts/ckpt_joint_best_ens_f1.pt` |
| |
| β οΈ R-series joint ckpts have a logit-εθ΄ issue (sigmoid > 0.5 hit rate ~0.13% on test data, |
| vs P110's ~1.0%) β they have great `recall@K` but cannot use sigmoid threshold 0.5 directly. |
| **For deployment, prefer P110 (`l10.pt`)** unless you specifically need the joint 3-layer format. |
| |