metadata
license: mit
language: en
pipeline_tag: token-classification
tags:
- ner
- resume-parsing
- cv-parser
base_model: roberta-base
CV Parser NER — roberta-base (v2)
Token-classification model that extracts Job Titles, Skills, and Education from resumes/CVs using a BIO tag scheme.
Provenance
- Trained from scratch on dataset 4 (
resume_bio_annotated_full.csv, 2,483 resumes — 1,739 train / 372 val / 372 test), the team's finalized AI-Studio/Vertex-relabelled dataset. - Reproduced end-to-end with the project notebooks/scripts
(
retokenize.py+train_bert_run.py). - Base model:
roberta-base· epochs: 5 · learning rate: 3e-5 · max_length 512 · stride 128 · seed 42.
Resume-level performance (dataset-4 splits)
| split | precision | recall | F1 |
|---|---|---|---|
| validation | — | — | 0.6397 |
| test | — | — | 0.6563 |
Labels
O, B-JOB_TITLE, I-JOB_TITLE, B-SKILL, I-SKILL, B-EDUCATION, I-EDUCATION