| license: mit | |
| language: en | |
| pipeline_tag: token-classification | |
| tags: | |
| - ner | |
| - resume-parsing | |
| - cv-parser | |
| base_model: roberta-base | |
| # CV Parser NER — roberta-base (v2) | |
| Token-classification model that extracts **Job Titles**, **Skills**, and | |
| **Education** from resumes/CVs using a BIO tag scheme. | |
| ## Provenance | |
| - **Trained from scratch on dataset 4** (`resume_bio_annotated_full.csv`, | |
| 2,483 resumes — 1,739 train / 372 val / 372 test), the team's finalized | |
| AI-Studio/Vertex-relabelled dataset. | |
| - Reproduced end-to-end with the project notebooks/scripts | |
| (`retokenize.py` + `train_bert_run.py`). | |
| - Base model: `roberta-base` · epochs: 5 · learning rate: 3e-5 · | |
| max_length 512 · stride 128 · seed 42. | |
| ## Resume-level performance (dataset-4 splits) | |
| | split | precision | recall | F1 | | |
| |-------|-----------|--------|----| | |
| | validation | — | — | 0.6397 | | |
| | test | — | — | 0.6563 | | |
| ## Labels | |
| `O, B-JOB_TITLE, I-JOB_TITLE, B-SKILL, I-SKILL, B-EDUCATION, I-EDUCATION` | |