| ---
|
| license: mit
|
| library_name: lightgbm
|
| tags:
|
| - learning-to-rank
|
| - lightgbm
|
| - lambdarank
|
| - recruitment
|
| - candidate-ranking
|
| ---
|
|
|
| # Intelligent Candidate Ranker (LightGBM LambdaRank)
|
|
|
| **The ranking model from the Redrob Intelligent Candidate Discovery and Ranking Challenge submission.**
|
|
|
| Given a 22-feature vector describing a candidate's fit against a job description, this model outputs a relevance score. It is trained on labels generated by 2,500 pairwise judgments from a local LLM (Gemma3) rather than hand-coded heuristics, specifically to avoid label circularity.
|
|
|
| 
|
|
|
| Full pipeline (retrieval, feature engineering, consistency scoring, reasoning generation) lives in the GitHub repo:
|
| https://github.com/Pranjal1342/Intelligent-Candidate-Discovery-Ranking-System
|
|
|
| ---
|
|
|
| ## This model's role in the pipeline
|
|
|
| This model is one stage inside a larger offline candidate-ranking pipeline. It does not do retrieval, does not compute the input features itself, and does not produce the final rank on its own.
|
|
|
| ```
|
| raw_score = this_model.predict(feature_vector)
|
| final_score = raw_score * consistency_score # applied by the host pipeline, not this model
|
| ```
|
|
|
| `consistency_score` is a separate, multiplicative honeypot/data-integrity check computed by the host application β it is not part of this model's output.
|
|
|
| ---
|
|
|
| ## How to load
|
|
|
| ```python
|
| from huggingface_hub import hf_hub_download
|
| import lightgbm as lgb
|
|
|
| model_path = hf_hub_download(
|
| repo_id="<your-username>/intelligent-candidate-ranker",
|
| filename="lgbm_model.txt"
|
| )
|
| model = lgb.Booster(model_file=model_path)
|
|
|
| raw_score = model.predict(feature_vector) # feature_vector: 22-dim float32
|
| ```
|
|
|
| ---
|
|
|
| ## Input / Output
|
|
|
| - **Input:** a 22-feature float32 vector per candidate (exact feature order below)
|
| - **Output:** a single raw relevance score (higher = more relevant). Not yet penalized for data-integrity issues β combine with a `consistency_score` downstream before final ranking.
|
|
|
| ### Feature vector (in order)
|
|
|
| | # | Feature | Formula / Source |
|
| |---|---|---|
|
| | 1 | `bm25_score` | Stage 1 BM25 retrieval score (normalised) |
|
| | 2 | `yoe` | `profile.years_of_experience` |
|
| | 3 | `Param_A_Systems_Depth` | Fraction of career months in roles whose descriptions contain retrieval, search, or ranking keywords |
|
| | 4 | `Param_B_Availability` | `(recruiter_response_rate + exp(-days_inactive / 90)) / 2` |
|
| | 5 | `Param_C_Tenure` | `min(avg_tenure_months, 48) / 48`, rewards 3+ year tenures |
|
| | 6 | `Param_D_Notice_Exp` | `exp(-max(0, days-30) / 30)`: 30d β 1.0, 60d β 0.37, 90d β 0.14, 150d β 0.006 |
|
| | 7 | `Param_E_Credibility` | `advanced_claimed_count / max(1, assessed_count)`, higher means less credible |
|
| | 8 | `Param_F_Consulting` | Fraction of career at IT-services consulting firms (`industry == "IT Services" AND size == "10001+"`) |
|
| | 9 | `Param_G_Location` | Noida/Pune = 1.0, other India = 0.7, outside and willing to relocate = 0.3, outside and unwilling = 0.0 |
|
| | 10 | `Param_H_GitHub` | `github_activity_score / 100`; 0.3 imputed when the field equals -1 (absent) |
|
| | 11 | `title_ai_fraction` | Career-weighted fraction in AI, ML, or data roles via a static title taxonomy |
|
| | 12 | `prod_signal_log` | Log-compressed production keyword count, -1.0 if academic-only |
|
| | 13 | `consistency_score` | Multiplicative honeypot penalty, c1 Γ c2 Γ c3 Γ c4 Γ c5 (included as a training feature; also reapplied post-inference β see below) |
|
| | 14 | `hard_req_coverage` | Fraction of JD hard requirements satisfied by the candidate's skill list |
|
| | 15 | `flag_consulting_only` | `consulting_fraction > 0.95` |
|
| | 16 | `flag_title_chaser` | `avg_tenure < 18 months` across 3+ jobs |
|
| | 17 | `flag_langchain_dabbler` | LLM-era months > 12 and pre-LLM months == 0 |
|
| | 18 | `flag_cv_specialist` | CV/speech months > 24 and NLP/IR months == 0 |
|
| | 19 | `flag_title_desc_mismatch` | Domain-category mismatch fraction across career history |
|
| | 20 | `flag_template_desc` | Max SequenceMatcher ratio against the template registry |
|
| | 21 | `interaction_req_x_consistency` | `hard_req_coverage * consistency_score` |
|
| | 22 | `interaction_yoe_x_prod` | `yoe * prod_signal_log` |
|
|
|
| ---
|
|
|
| ## Training
|
|
|
| **Model configuration:**
|
| - `objective: lambdarank`
|
| - `eval_at: [5, 10, 50]`, explicitly optimising Precision@5
|
| - Early stopping monitors NDCG@5, patience 30
|
| - 200 boosting rounds
|
|
|
| **Training labels β Gemma3 pairwise annotation (the key differentiator):**
|
|
|
| Rather than a pure heuristic label, training labels are generated via 2,500 pairwise LLM comparisons using Gemma3:4b-it-q4_K_M running locally on Ollama, with zero external API calls and full reproducibility. A stratified sample of 500 candidates is drawn across three strata (top-100, boundary 101β300, and a broader pool with guaranteed low-consistency coverage), and each candidate receives roughly five matchups against random opponents.
|
|
|
| For each pair, Gemma3 reads both candidates' full structured profiles alongside the JD requirements and disqualifiers, then produces a single verdict: `CANDIDATE_A`, `CANDIDATE_B`, or `TIE`. Win and loss tallies convert to Elo ratings via Laplace-smoothed win rates:
|
|
|
| ```python
|
| win_rate = (wins + 0.5) / (total + 1)
|
| elo = 400 * log10(win_rate / (1 - win_rate)) + 1500
|
| ```
|
|
|
| Elo ratings are thresholded to 0β3 relevance labels by quartile, producing a balanced training set with roughly 125 candidates per label.
|
|
|
| **Why this breaks circularity:** Gemma had no knowledge of the 22 engineered features, the BM25 scores, or the penalty weights. It learned independently that IR-specific skills (FAISS, BM25, Qdrant, Sentence Transformers) outrank generic ML skills, and that production-company backgrounds outrank consulting-only careers. LightGBM then learns how the 22 features correlate with these independent judgments, surfacing interactions that were never explicitly encoded.
|
|
|
| ---
|
|
|
| ## Model Comparison: Heuristic vs. Gemma-Trained
|
|
|
| The competition provides no ground-truth relevance labels, so a standard NDCG@10 ablation against a labeled holdout set isn't possible to compute honestly. What is available, and what is reported here, is a direct head-to-head comparison between a LightGBM model trained on the original heuristic weak label and this model (trained on Gemma3 pairwise labels), run on the same candidate pool with the same feature vectors.
|
|
|
| **Method:** both trained models score the full ~8,500-candidate retrieval pool. The same post-inference consistency multiplier is applied to both before ranking, so the comparison isolates the effect of the training label, not the honeypot suppression layer.
|
|
|
| | Metric | Result |
|
| |---|---|
|
| | Top-10 overlap between the two models | 0 of 10 candidates in common |
|
| | Spearman rank correlation (top-100) | 0.001 β statistically independent rankings |
|
| | Honeypot leakage, heuristic-trained model | Required a hand-coded post-processing suppression list to keep keyword-stuffed non-technical profiles out of the top 100 |
|
| | Honeypot leakage, Gemma-trained model (this model) | 0 of 100 candidates with `consistency_score < 0.25`, achieved with no post-processing suppression list |
|
|
|
| **Qualitative before/after:** prior to the Gemma retrain, the heuristic-trained model's unsuppressed top-10 surfaced profiles such as Content Writer, Project Manager, and Sales Executive β each with AI-sounding skills listed but no underlying technical career history, because the heuristic label rewarded keyword coverage directly. After the Gemma retrain, the same pool's top-10 surfaced candidates with FAISS, BM25, Qdrant, Sentence Transformers, and Hugging Face Transformers in their skill history β sourced from a model that never saw `bm25_score` or `hard_req_coverage` during label generation and discovered the IR-relevance ordering independently from reading full candidate profiles.
|
|
|
| The two models disagreeing almost completely (Spearman 0.001) is itself evidence of non-circularity: a model trained on labels derived from the same 22 features it predicts on would be expected to correlate strongly with a heuristic built from those same features, not diverge from it entirely.
|
|
|
| ---
|
|
|
| ## Intended Use & Limitations
|
|
|
| - Built for a hackathon submission (Redrob Intelligent Candidate Discovery and Ranking Challenge); not validated for production hiring decisions.
|
| - Expects the exact 22-feature schema above, computed by the host pipeline's `src/features.py`. Feeding hand-built or differently-ordered features will produce meaningless scores.
|
| - Raw model output is **not** the final ranking score β it must be multiplied by a separately computed `consistency_score` before use.
|
| - Trained on a synthetic/competition candidate dataset; label distribution and feature semantics may not generalize to other candidate pools without retraining.
|
|
|
| ---
|
|
|
| ## AI Tool Disclosure
|
|
|
| Gemma3:4b-it-q4_K_M (Google DeepMind, running locally via Ollama) was used offline to generate 2,500 pairwise relevance judgments on a stratified sample of 500 candidates. These judgments served as independent, non-circular training labels for this model. No candidate data was transmitted to any external service at any point. |