---
license: mit
language: en
pipeline_tag: token-classification
tags:
- ner
- resume-parsing
- cv-parser
base_model: roberta-base
---

# CV Parser NER — roberta-base (v2)

Token-classification model that extracts **Job Titles**, **Skills**, and
**Education** from resumes/CVs using a BIO tag scheme.

## Provenance
- **Trained from scratch on dataset 4** (`resume_bio_annotated_full.csv`,
  2,483 resumes — 1,739 train / 372 val / 372 test), the team's finalized
  AI-Studio/Vertex-relabelled dataset.
- Reproduced end-to-end with the project notebooks/scripts
  (`retokenize.py` + `train_bert_run.py`).
- Base model: `roberta-base` · epochs: 5 · learning rate: 3e-5 ·
  max_length 512 · stride 128 · seed 42.

## Resume-level performance (dataset-4 splits)
| split | precision | recall | F1 |
|-------|-----------|--------|----|
| validation | — | — | 0.6397 |
| test       | — | — | 0.6563 |

## Labels
`O, B-JOB_TITLE, I-JOB_TITLE, B-SKILL, I-SKILL, B-EDUCATION, I-EDUCATION`