--- license: mit language: en pipeline_tag: token-classification tags: - ner - resume-parsing - cv-parser base_model: roberta-base --- # CV Parser NER — roberta-base (v2) Token-classification model that extracts **Job Titles**, **Skills**, and **Education** from resumes/CVs using a BIO tag scheme. ## Provenance - **Trained from scratch on dataset 4** (`resume_bio_annotated_full.csv`, 2,483 resumes — 1,739 train / 372 val / 372 test), the team's finalized AI-Studio/Vertex-relabelled dataset. - Reproduced end-to-end with the project notebooks/scripts (`retokenize.py` + `train_bert_run.py`). - Base model: `roberta-base` · epochs: 5 · learning rate: 3e-5 · max_length 512 · stride 128 · seed 42. ## Resume-level performance (dataset-4 splits) | split | precision | recall | F1 | |-------|-----------|--------|----| | validation | — | — | 0.6397 | | test | — | — | 0.6563 | ## Labels `O, B-JOB_TITLE, I-JOB_TITLE, B-SKILL, I-SKILL, B-EDUCATION, I-EDUCATION`