LordofMonarchs's picture
Upload folder using huggingface_hub
3751f09 verified
|
Raw
History Blame Contribute Delete
853 Bytes
# experiments/pairwise_llm_check/
## What This Experiment Does
This is an **offline experiment** that generates better LightGBM training labels
by replacing the heuristic weak_label = hard_req_coverage × consistency_score × jd_penalty
with LLM pairwise judgments on sampled Stage 1 candidates.
### Pipeline Summary
1. Load Stage 1 BM25 retrieval pool.
2. Stratified sample of candidates weighted toward the current model's top and boundary regions.
3. Generate pairwise matchups; annotate with quantized LLaMA via Ollama.
4. Convert pairwise verdicts → Elo ratings → 0–3 integer relevance labels.
5. Retrain LightGBM on these labels using identical hyperparameters to precompute.py.
6. Save the new model as precomputed/lgbm_model_llm.pkl.
7. Print a comparison report: top-10 overlap, Spearman correlation, honeypot audit.