Resume NER BERT v2 (ONNX)

Pre-exported ONNX version of yashpwr/resume-ner-bert-v2 for direct use without Python/PyTorch dependencies.

Converted using Optimum for use in lucidRESUME, a local-first resume analysis desktop app built with .NET and ONNX Runtime.

Model Details


Original model	yashpwr/resume-ner-bert-v2
Architecture	BertForTokenClassification (bert-base-cased)
Parameters	107.7M
Task	Token classification (NER, BIO scheme)
License	Apache 2.0
Export tool	`optimum.exporters.onnx` (Optimum + Transformers 4.57)

Performance

Metric	Score
F1	90.87%
Precision	91.44%
Recall	90.81%

Entity Types

The model recognises 12 entity types using BIO tagging (25 labels total):

Entity	Description
Name	Person's full name
Email Address	Email contact
Phone	Phone number
Location	Geographic location
Companies worked at	Previous employers
Designation	Job titles / roles
Skills	Technical and soft skills
Years of Experience	Work duration
Degree	Educational qualifications
College Name	Educational institutions
Graduation Year	Year of degree completion
UNKNOWN	Unclassified entities

Usage

With ONNX Runtime (.NET)

This model is used by lucidRESUME for resume entity extraction via ONNX Runtime in C#. The app downloads this model automatically on first launch -- no Python required.

With ONNX Runtime (Python)

from optimum.onnxruntime import ORTModelForTokenClassification
from transformers import AutoTokenizer, pipeline

tokenizer = AutoTokenizer.from_pretrained("scottgal/resume-ner-bert-v2-onnx")
model = ORTModelForTokenClassification.from_pretrained("scottgal/resume-ner-bert-v2-onnx")

ner = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy="simple")
results = ner("John Smith, Software Engineer at Google with 5 years of experience. BSc Computer Science from MIT.")
for entity in results:
    print(f"{entity['entity_group']}: {entity['word']} ({entity['score']:.2f})")

Training Data

The original model was trained on 22,542 samples:

Resume-Corpus Dataset -- 349 samples
DataTurks Resume NER -- 420 samples
Custom Training Data -- 21,773 samples (rule-based extraction)
Mehyaar Skills Dataset -- skills-focused data

See the original model card for full training details.

Limitations

English only
Best with text-based resumes (not scanned images)
Primarily trained on technology and business resumes
Optimal for resumes under 512 tokens (trained with max sequence length of 128)

Model tree for scottgal/resume-ner-bert-v2-onnx

Base model

yashpwr/resume-ner-bert-v2

Quantized

(1)

this model

scottgal
/

resume-ner-bert-v2-onnx