---
language:
- en
license: apache-2.0
library_name: transformers
tags:
- text-classification
- cross-encoder
- information-retrieval
- job-skill-matching
- esco
- talentclef
- reranking
- bert
base_model: cross-encoder/ms-marco-MiniLM-L-12-v2
pipeline_tag: text-classification
model-index:
- name: skillscout-reranker
  results:
  - task:
      type: information-retrieval
      name: Information Retrieval (re-ranking)
    dataset:
      name: TalentCLEF 2026 Task B Validation
      type: talentclef-2026-taskb-validation
    metrics:
    - type: ndcg_at_10_graded
      value: 0.6896
      name: nDCG@10 Graded (pipeline, server)
    - type: ndcg_at_10_binary
      value: 0.7330
      name: nDCG@10 Binary (pipeline, server)
---
# SkillScout Reranker - Job-Skill Cross-Encoder

**SkillScout Reranker** is a cross-encoder that re-ranks candidate skills for a given job title,
predicting **graded relevance** (0=irrelevant, 1=contextual, 2=core).

This is **Stage 2** of the TalentGuide two-stage job-skill matching pipeline, trained for
[TalentCLEF 2026 Task B](https://talentclef.github.io/).

> **Best pipeline result (TalentCLEF 2026 validation set, server-side):**
> nDCG@10 graded = **0.6896** | nDCG@10 binary = **0.7330**
> SkillScout Large (bi-encoder) + SkillScout Reranker at blend alpha=0.7.

---

## Model Summary

| Property | Value |
|---|---|
| Base model | [cross-encoder/ms-marco-MiniLM-L-12-v2](https://huggingface.co/cross-encoder/ms-marco-MiniLM-L-12-v2) |
| Architecture | BERT (MiniLM-L12) + 3-class classification head |
| Hidden size | 384 | Max seq length | 128 tokens |
| Output classes | 0 = non-relevant, 1 = contextual, 2 = core |
| Training triples | ~130k (job_title, skill, label) |
| Hard negatives | 5 per job, mined from bi-encoder top-K |
| Epochs | 3 | Batch size | 32 |
| Hardware | NVIDIA RTX 3070 8GB, fp16 AMP |

---

## Usage

### Installation

```bash
pip install transformers torch
```

### Score a single (job, skill) pair

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tokenizer = AutoTokenizer.from_pretrained("talentguide/skillscout-reranker")
model     = AutoModelForSequenceClassification.from_pretrained("talentguide/skillscout-reranker")
model.eval()

job   = "Data Scientist"
skill = "data science"

enc = tokenizer(job, skill, return_tensors="pt", truncation=True, max_length=128)
with torch.no_grad():
    logits = model(**enc).logits          # shape [1, 3]
probs = logits.softmax(-1)[0].tolist()  # [P(irrelevant), P(contextual), P(core)]
relevance = logits.argmax(-1).item()     # 0, 1, or 2

print(f"Relevance class: {relevance}  (0=none, 1=contextual, 2=core)")
print(f"Probs: none={probs[0]:.3f}  contextual={probs[1]:.3f}  core={probs[2]:.3f}")
# Relevance class: 2  (0=none, 1=contextual, 2=core)
# Probs: none=0.031  contextual=0.142  core=0.827
```

### Re-rank a candidate list

```python
# candidates: list of skill texts from bi-encoder (e.g. top-200)
pairs = [(job, skill) for skill in candidates]
encs  = tokenizer(pairs, return_tensors="pt", truncation=True,
                  padding=True, max_length=128)

with torch.no_grad():
    logits = model(**encs).logits   # [N, 3]

# Use class-2 logit (core probability) as ranking score
scores = logits[:, 2].tolist()
ranked = sorted(zip(candidates, scores), key=lambda x: -x[1])

for rank, (skill, score) in enumerate(ranked[:10], 1):
    print(f"{rank:3d}. [{score:.3f}]  {skill}")
```

### Blend with bi-encoder (recommended, alpha=0.7)

```python
# bi_scores: cosine scores from SkillScout Large (normalised to [0,1])
# ce_scores:  class-2 logit from this model (normalised to [0,1])
alpha = 0.7
final_score = alpha * bi_score + (1 - alpha) * ce_score
```

---

## Two-Stage Pipeline Integration

```
Job title
   |
   v
[SkillScout Large]          <- talentguide/skillscout-large
   |  top-200 candidates via FAISS ANN
   v
[SkillScout Reranker]       <- this model
   |  3-class graded scoring (core=2, contextual=1, irrelevant=0)
   v
Final ranked list
```

---

## Training Details

### Data

| | Count |
|---|---|
| Positive triples (essential, label=2) | ~57,500 |
| Positive triples (optional, label=1) | ~28,600 |
| Hard negatives (label=0, from bi-encoder top-K) | ~15,200 |
| Random negatives (label=0) | ~30,000 |
| Total training triples | ~130,000 |
| Validation queries | 304 | Validation corpus | 9,052 skills |

**Hard negatives** are mined by running the fine-tuned bi-encoder (SkillScout Large) over all
training jobs, collecting the top-K retrieved skills that are NOT in the positive set.
This teaches the cross-encoder to distinguish near-miss retrievals from true positives.

### Hyperparameters

```
Base model     : cross-encoder/ms-marco-MiniLM-L-12-v2
Task           : 3-class sequence classification (BERT + linear head)
Loss           : CrossEntropyLoss
Batch size     : 32
Epochs         : 3
Learning rate  : 2e-5, linear warmup 10%
Optimizer      : AdamW
Precision      : fp16 AMP
Max seq len    : 128 tokens
Input format   : [CLS] job_title [SEP] skill_name [SEP]
```

### Pipeline Results (graded relevance, full 9052-skill ranking)

| Run | nDCG@10 graded | nDCG@10 binary | MAP |
|---|---|---|---|
| Bi-encoder only (SkillScout Large) | 0.3621 | 0.4830 | 0.4545 |
| + CE bad negatives (v1) | 0.3226 | 0.4025 | 0.4195 |
| + CE fixed negatives (v2) | 0.3315 | 0.4075 | 0.4228 |
| + CE blend alpha=0.7 (local, top-100) | 0.3816 | 0.4973 | 0.4632 |
| **+ CE blend alpha=0.7 (server, full ranking)** | **0.6896** | **0.7330** | 0.2481 |

*Local metrics use top-100 retrieval cutoff; server metrics use full 9,052-skill ranking.*

---

## Limitations

- **Must be paired with a retriever** - evaluates pairs, not full corpus ranking. Use with SkillScout Large for efficient retrieval.
- **English only** - trained on ESCO EN labels.
- **ESCO-domain optimised** - transfer to other taxonomies may require fine-tuning.
- **Speed** - re-ranks top-200 candidates (~1-2s per query on GPU). Not suitable for full-corpus scoring at inference time.

---

## Citation

```bibtex
@misc{talentguide-skillscout-reranker-2026,
  title  = {SkillScout Reranker: Graded Job-Skill Cross-Encoder for TalentCLEF 2026},
  author = {TalentGuide},
  year   = {2026},
  url    = {https://huggingface.co/talentguide/skillscout-reranker}
}

@misc{talentclef2026taskb,
  title  = {TalentCLEF 2026 Task B: Job-Skill Matching},
  author = {TalentCLEF Organizers},
  year   = {2026},
  url    = {https://talentclef.github.io/}
}
```

---

## Framework Versions

- Python 3.12.10 | Transformers 5.5.0 | PyTorch 2.11.0+cu128