Resume NER BERT v2 (ONNX)

Pre-exported ONNX version of yashpwr/resume-ner-bert-v2 for direct use without Python/PyTorch dependencies.

Converted using Optimum for use in lucidRESUME, a local-first resume analysis desktop app built with .NET and ONNX Runtime.

Model Details

Original model yashpwr/resume-ner-bert-v2
Architecture BertForTokenClassification (bert-base-cased)
Parameters 107.7M
Task Token classification (NER, BIO scheme)
License Apache 2.0
Export tool optimum.exporters.onnx (Optimum + Transformers 4.57)

Performance

Metric Score
F1 90.87%
Precision 91.44%
Recall 90.81%

Entity Types

The model recognises 12 entity types using BIO tagging (25 labels total):

Entity Description
Name Person's full name
Email Address Email contact
Phone Phone number
Location Geographic location
Companies worked at Previous employers
Designation Job titles / roles
Skills Technical and soft skills
Years of Experience Work duration
Degree Educational qualifications
College Name Educational institutions
Graduation Year Year of degree completion
UNKNOWN Unclassified entities

Usage

With ONNX Runtime (.NET)

This model is used by lucidRESUME for resume entity extraction via ONNX Runtime in C#. The app downloads this model automatically on first launch -- no Python required.

With ONNX Runtime (Python)

from optimum.onnxruntime import ORTModelForTokenClassification
from transformers import AutoTokenizer, pipeline

tokenizer = AutoTokenizer.from_pretrained("scottgal/resume-ner-bert-v2-onnx")
model = ORTModelForTokenClassification.from_pretrained("scottgal/resume-ner-bert-v2-onnx")

ner = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy="simple")
results = ner("John Smith, Software Engineer at Google with 5 years of experience. BSc Computer Science from MIT.")
for entity in results:
    print(f"{entity['entity_group']}: {entity['word']} ({entity['score']:.2f})")

Training Data

The original model was trained on 22,542 samples:

  • Resume-Corpus Dataset -- 349 samples
  • DataTurks Resume NER -- 420 samples
  • Custom Training Data -- 21,773 samples (rule-based extraction)
  • Mehyaar Skills Dataset -- skills-focused data

See the original model card for full training details.

Limitations

  • English only
  • Best with text-based resumes (not scanned images)
  • Primarily trained on technology and business resumes
  • Optimal for resumes under 512 tokens (trained with max sequence length of 128)

Links

Downloads last month
19
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for scottgal/resume-ner-bert-v2-onnx

Quantized
(1)
this model