File size: 1,030 Bytes
4c5282b | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | ---
license: mit
language: en
pipeline_tag: token-classification
tags:
- ner
- resume-parsing
- cv-parser
base_model: roberta-base
---
# CV Parser NER — roberta-base (v2)
Token-classification model that extracts **Job Titles**, **Skills**, and
**Education** from resumes/CVs using a BIO tag scheme.
## Provenance
- **Trained from scratch on dataset 4** (`resume_bio_annotated_full.csv`,
2,483 resumes — 1,739 train / 372 val / 372 test), the team's finalized
AI-Studio/Vertex-relabelled dataset.
- Reproduced end-to-end with the project notebooks/scripts
(`retokenize.py` + `train_bert_run.py`).
- Base model: `roberta-base` · epochs: 5 · learning rate: 3e-5 ·
max_length 512 · stride 128 · seed 42.
## Resume-level performance (dataset-4 splits)
| split | precision | recall | F1 |
|-------|-----------|--------|----|
| validation | — | — | 0.6397 |
| test | — | — | 0.6563 |
## Labels
`O, B-JOB_TITLE, I-JOB_TITLE, B-SKILL, I-SKILL, B-EDUCATION, I-EDUCATION`
|