A newer version of the Streamlit SDK is available: 1.58.0
experiments/pairwise_llm_check/
What This Experiment Does
This is an offline experiment that generates better LightGBM training labels by replacing the heuristic weak_label = hard_req_coverage × consistency_score × jd_penalty with LLM pairwise judgments on sampled Stage 1 candidates.
Pipeline Summary
- Load Stage 1 BM25 retrieval pool.
- Stratified sample of candidates weighted toward the current model's top and boundary regions.
- Generate pairwise matchups; annotate with quantized LLaMA via Ollama.
- Convert pairwise verdicts → Elo ratings → 0–3 integer relevance labels.
- Retrain LightGBM on these labels using identical hyperparameters to precompute.py.
- Save the new model as precomputed/lgbm_model_llm.pkl.
- Print a comparison report: top-10 overlap, Spearman correlation, honeypot audit.