| --- |
| language: |
| - en |
| license: apache-2.0 |
| library_name: sentence-transformers |
| tags: |
| - sentence-transformers |
| - sentence-similarity |
| - feature-extraction |
| - dense-retrieval |
| - information-retrieval |
| - job-skill-matching |
| - esco |
| - talentclef |
| - xlm-roberta |
| base_model: jjzha/esco-xlm-roberta-large |
| pipeline_tag: sentence-similarity |
| model-index: |
| - name: skillscout-large |
| results: |
| - task: |
| type: information-retrieval |
| name: Information Retrieval |
| dataset: |
| name: TalentCLEF 2026 Task B — Validation (304 queries, 9052 skills) |
| type: talentclef-2026-taskb-validation |
| metrics: |
| - type: cosine_ndcg_at_10 |
| value: 0.4830 |
| name: nDCG@10 |
| - type: cosine_map_at_100 |
| value: 0.1825 |
| name: MAP@100 |
| - type: cosine_mrr_at_10 |
| value: 0.6657 |
| name: MRR@10 |
| - type: cosine_accuracy_at_1 |
| value: 0.5099 |
| name: Accuracy@1 |
| - type: cosine_accuracy_at_10 |
| value: 0.9474 |
| name: Accuracy@10 |
| --- |
| |
| # SkillScout Large — Job-to-Skill Dense Retriever |
|
|
| **SkillScout Large** is a dense bi-encoder for retrieving relevant skills from a job title. |
| Given a job title (e.g., *"Data Scientist"*), it encodes it into a 1024-dimensional embedding and retrieves the most semantically relevant skills from the [ESCO](https://esco.ec.europa.eu/) skill gazetteer (9,052 skills) using cosine similarity. |
|
|
| This is **Stage 1** of the TalentGuide two-stage job-skill matching pipeline, trained for [TalentCLEF 2026 Task B](https://talentclef.github.io/). |
|
|
| > **Best pipeline result (TalentCLEF 2026 validation set):** |
| > nDCG@10 graded = **0.6896** · nDCG@10 binary = **0.7330** |
| > when combined with a fine-tuned cross-encoder re-ranker at blend α = 0.7. |
| > Bi-encoder alone: nDCG@10 graded = **0.3621** · MAP = **0.4545** |
|
|
| --- |
|
|
| ## Model Summary |
|
|
| | Property | Value | |
| |---|---| |
| | Base model | [`jjzha/esco-xlm-roberta-large`](https://huggingface.co/jjzha/esco-xlm-roberta-large) | |
| | Architecture | XLM-RoBERTa-large + mean pooling | |
| | Embedding dimension | 1024 | |
| | Max sequence length | 64 tokens | |
| | Training loss | Multiple Negatives Ranking (MNR) | |
| | Training pairs | 93,720 (ESCO job–skill pairs, essential + optional) | |
| | Epochs | 3 | |
| | Best checkpoint | Step 3500 (saved by validation nDCG@10) | |
| | Hardware | NVIDIA RTX 3070 8GB · fp16 AMP | |
|
|
| --- |
|
|
| ## What is TalentCLEF Task B? |
|
|
| **TalentCLEF 2026 Task B** is a graded information-retrieval shared task: |
|
|
| - **Query**: a job title (e.g., *"Electrician"*) |
| - **Corpus**: 9,052 ESCO skills (e.g., *"install electric switches"*, *"comply with electrical safety regulations"*) |
| - **Relevance levels**: |
| - `2` — Core skill (essential regardless of context) |
| - `1` — Contextual skill (depends on employer / industry) |
| - `0` — Non-relevant |
|
|
| **Primary metric**: nDCG with graded relevance (core=2, contextual=1) |
|
|
| --- |
|
|
| ## Usage |
|
|
| ### Installation |
|
|
| ```bash |
| pip install sentence-transformers faiss-cpu # or faiss-gpu |
| ``` |
|
|
| ### Encode & Compare |
|
|
| ```python |
| from sentence_transformers import SentenceTransformer |
| |
| model = SentenceTransformer("talentguide/skillscout-large") |
| |
| job = "Data Scientist" |
| skills = ["data science", "machine learning", "install electric switches"] |
| |
| embs = model.encode([job] + skills, normalize_embeddings=True) |
| scores = embs[0] @ embs[1:].T |
| |
| for skill, score in zip(skills, scores): |
| print(f"{score:.3f} {skill}") |
| # 0.872 data science |
| # 0.731 machine learning |
| # 0.112 install electric switches |
| ``` |
|
|
| ### Full Retrieval with FAISS (Recommended) |
|
|
| ```python |
| from sentence_transformers import SentenceTransformer |
| import faiss, numpy as np |
| |
| model = SentenceTransformer("talentguide/skillscout-large") |
| |
| # --- Build index once over your skill corpus --- |
| skill_texts = [...] # list of skill names / descriptions |
| |
| embs = model.encode(skill_texts, batch_size=128, |
| normalize_embeddings=True, |
| show_progress_bar=True).astype(np.float32) |
| |
| index = faiss.IndexFlatIP(embs.shape[1]) # inner product on L2-normed = cosine |
| index.add(embs) |
| |
| # --- Query at inference time --- |
| job_title = "Software Engineer" |
| q = model.encode([job_title], normalize_embeddings=True).astype(np.float32) |
| |
| scores, idxs = index.search(q, k=50) |
| for rank, (idx, score) in enumerate(zip(idxs[0], scores[0]), 1): |
| print(f"{rank:3d}. [{score:.4f}] {skill_texts[idx]}") |
| ``` |
|
|
| ### Demo Output |
|
|
| ``` |
| Software Engineer |
| 1. [0.942] define software architecture |
| 2. [0.938] software frameworks |
| 3. [0.935] create software design |
| |
| Data Scientist |
| 1. [0.951] data science |
| 2. [0.921] establish data processes |
| 3. [0.919] create data models |
| |
| Electrician |
| 1. [0.944] install electric switches |
| 2. [0.938] install electricity sockets |
| 3. [0.930] use electrical wire tools |
| ``` |
|
|
| --- |
|
|
| ## Two-Stage Pipeline Integration |
|
|
| SkillScout Large is designed as **Stage 1** — fast ANN retrieval. |
| For maximum ranking quality, pair it with a cross-encoder re-ranker: |
|
|
| ``` |
| Job title |
| │ |
| ▼ |
| [SkillScout Large] ← this model |
| │ top-200 candidates (FAISS ANN, ~40ms) |
| ▼ |
| [Cross-encoder re-ranker] |
| │ fine-grained re-scoring of top-200 |
| ▼ |
| Final ranked list (graded: core > contextual > irrelevant) |
| ``` |
|
|
| **Score blending** (best result at α = 0.7): |
|
|
| ```python |
| final_score = alpha * biencoder_score + (1 - alpha) * crossencoder_score |
| ``` |
|
|
| --- |
|
|
| ## Training Details |
|
|
| ### Data |
|
|
| Source: [ESCO occupational ontology](https://esco.ec.europa.eu/), TalentCLEF 2026 training split. |
|
|
| | | Count | |
| |---|---| |
| | Raw job–skill pairs (essential + optional) | 114,699 | |
| | ESCO jobs with aliases | 3,039 | |
| | ESCO skills with aliases | 13,939 | |
| | Training InputExamples (after canonical-pair inclusion) | **93,720** | |
| | Validation queries | 304 | |
| | Validation corpus (skills) | 9,052 | |
| | Validation relevance judgments | 56,417 | |
|
|
| Essential pairs are included in full; optional skill pairs are downsampled to 50% of the essential count to maintain class balance. |
|
|
| ### Hyperparameters |
|
|
| ``` |
| Loss : MultipleNegativesRankingLoss (scale=20, cos_sim) |
| Batch size : 64 → 63 in-batch negatives per anchor |
| Epochs : 3 |
| Warmup : 10% of total steps (~440 steps) |
| Optimizer : AdamW (fused), lr=5e-5, linear decay |
| Precision : fp16 (AMP) |
| Max seq length : 64 tokens |
| Best model saved : by cosine-nDCG@10 on validation (eval every 500 steps) |
| Seed : 42 |
| ``` |
|
|
| ### Training Curve |
|
|
| | Epoch | Step | Train Loss | nDCG@10 (val) | MAP@100 (val) | |
| |:---:|:---:|:---:|:---:|:---:| |
| | 0.34 | 500 | 2.9232 | 0.3430 | — | |
| | 0.68 | 1000 | 2.1179 | 0.3424 | — | |
| | 1.00 | 1465 | — | 0.3676 | 0.1758 | |
| | 1.37 | 2000 | 1.7070 | 0.3692 | — | |
| | 1.71 | 2500 | 1.6366 | 0.3744 | — | |
| | 2.00 | 2930 | — | 0.3717 | 0.1780 | |
| | 2.39 | **3500** ✓ | **1.4540** | **0.3769** | **0.1808** | |
|
|
| *Best checkpoint saved at step 3500.* |
|
|
| ### Validation Metrics (best checkpoint, binary relevance) |
|
|
| | Metric | Value | |
| |---|---| |
| | **nDCG@10** | **0.4830** | |
| | nDCG@50 | 0.4240 | |
| | nDCG@100 | 0.3769 | |
| | **MAP@100** | **0.1825** | |
| | **MRR@10** | **0.6657** | |
| | Accuracy@1 | 0.5099 | |
| | Accuracy@3 | 0.7993 | |
| | Accuracy@5 | 0.8914 | |
| | Accuracy@10 | **0.9474** | |
|
|
| *Evaluated with `sentence_transformers.evaluation.InformationRetrievalEvaluator` (binary: any qrel > 0 = relevant).* |
|
|
| ### Pipeline Results (graded nDCG, full 9052-skill ranking, server-side) |
|
|
| | Run | nDCG@10 graded | nDCG@10 binary | MAP | |
| |---|---|---|---| |
| | Zero-shot `jjzha/esco-xlm-roberta-large` | 0.2039 | 0.2853 | 0.2663 | |
| | **SkillScout Large (bi-encoder only)** | **0.3621** | **0.4830** | **0.4545** | |
| | SkillScout Large + cross-encoder (α=0.7) | **0.6896** | **0.7330** | 0.2481 | |
|
|
| --- |
|
|
| ## Competitive Context (TalentCLEF 2025 Task B) |
|
|
| | Team | MAP (test) | Approach | |
| |---|---|---| |
| | pjmathematician (winner 2025) | 0.36 | GTE 7B + contrastive + LLM-augmented data | |
| | NLPnorth (3rd of 14, 2025) | 0.29 | 3-class discriminative classification | |
| | **SkillScout Large (2026 val)** | **0.4545** | MNR fine-tuned bi-encoder (Stage 1 only) | |
|
|
| --- |
|
|
| ## Limitations |
|
|
| - **English only** — trained on ESCO EN labels. |
| - **ESCO-domain** — optimised for the ESCO skill taxonomy; performance on other taxonomies (O*NET, custom) may vary without fine-tuning. |
| - **64-token cap** — long job descriptions should be reduced to a concise title before encoding. |
| - **Graded distinction** — the bi-encoder alone does not reliably separate core (2) from contextual (1) skills; a cross-encoder re-ranker is needed for strong graded nDCG. |
| |
| --- |
| |
| ## Citation |
| |
| ```bibtex |
| @misc{talentguide-skillscout-2026, |
| title = {SkillScout Large: Dense Job-to-Skill Retrieval for TalentCLEF 2026}, |
| author = {TalentGuide}, |
| year = {2026}, |
| url = {https://huggingface.co/talentguide/skillscout-large} |
| } |
| |
| @misc{talentclef2026taskb, |
| title = {TalentCLEF 2026 Task B: Job-Skill Matching}, |
| author = {TalentCLEF Organizers}, |
| year = {2026}, |
| url = {https://talentclef.github.io/} |
| } |
| ``` |
| |
| --- |
| |
| ## Framework Versions |
| |
| | Package | Version | |
| |---|---| |
| | Python | 3.12.10 | |
| | sentence-transformers | 5.3.0 | |
| | transformers | 5.5.0 | |
| | PyTorch | 2.11.0+cu128 | |
| | Accelerate | 1.13.0 | |
| | Tokenizers | 0.22.2 | |
| |
| --- |
| |
| ## License |
| |
| [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
| |