Upload Resume NER model

4a4ba56 verified 4 months ago

4.61 kB

datasets:
  - dataturks/resume-entities-for-ner
language:
  - en
license: apache-2.0
metrics:
  - f1
  - precision
  - recall
model-index:
  - name: resume-ner-distilbert
    results:
      - dataset:
          name: Resume NER
          type: dataturks/resume-entities-for-ner
        metrics:
          - name: F1
            type: f1
            value: 0.211
          - name: Precision
            type: precision
            value: 0.2189
          - name: Recall
            type: recall
            value: 0.2037
        task:
          name: Named Entity Recognition
          type: token-classification
pipeline_tag: token-classification
tags:
  - ner
  - token-classification
  - resume
  - nlp
  - transformers
  - lora

Resume Ner Distilbert

Model Description

This is a fine-tuned Named Entity Recognition (NER) model for extracting structured information from resumes. The model is trained using Fine-tuning with LoRA on a distilbert-base-uncased backbone.

Supported Entity Types

Entity Type	Description	Example
NAME	Person's name	"John Doe"
EMAIL	Email address	"john@example.com"
PHONE	Phone number	"+1 555-123-4567"
LOCATION	Geographic location	"San Francisco, CA"
ORG	Organization/Company	"Google Inc."
TITLE	Job title	"Senior Software Engineer"
DEGREE	Academic degree	"Bachelor of Science in Computer Science"
SKILL	Technical or soft skill	"Python", "Machine Learning"
CERT	Certification	"AWS Solutions Architect"
DATE	Date or time period	"2020-2023", "January 2022"

Training Details

Training Data

Dataset: Dataturks Resume Entities for NER
Source: Kaggle
Training Examples: N/A
Validation Examples: N/A

Training Configuration

Base Model: distilbert-base-uncased
Training Method: Fine-tuning with LoRA
Epochs: 10
Learning Rate: 3e-05
Batch Size: 8
Max Sequence Length: 512
Random Seed: 42

Performance Metrics

Metric	Validation Set
Precision	0.2189
Recall	0.2037
F1-Score	0.2110

Usage

Using Transformers Pipeline

from transformers import pipeline

# Load the model
ner = pipeline("ner", model="Joshuant/resume-ner-distilbert", aggregation_strategy="simple")

# Extract entities from resume text
text = """
John Doe
Email: john.doe@email.com
Phone: +1 555-123-4567

EDUCATION
Bachelor of Science in Computer Science, MIT, 2020

EXPERIENCE
Senior Software Engineer at Google, 2020-2023
- Developed ML pipelines using Python and TensorFlow
"""

entities = ner(text)
for entity in entities:
    print(f"{entity['entity_group']}: {entity['word']} (score: {entity['score']:.3f})")

Using AutoModel

from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Joshuant/resume-ner-distilbert")
model = AutoModelForTokenClassification.from_pretrained("Joshuant/resume-ner-distilbert")

# Tokenize input
text = "John Doe, Senior Software Engineer at Google"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)

# Run inference
with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.argmax(outputs.logits, dim=-1)

# Decode predictions
tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0])
labels = [model.config.id2label[p.item()] for p in predictions[0]]

for token, label in zip(tokens, labels):
    if label != "O":
        print(f"{token}: {label}")

Limitations

The model is primarily trained on English resumes and may not perform well on other languages
Performance may vary based on resume formatting and structure
The model may struggle with unusual entity formats or domain-specific terminology

Citation

If you use this model in your research, please cite:

@misc{resume_ner_slm_2026,
    title={Context-Aware Resume NER with Small Language Models},
    author={Research Team},
    year={2026},
    howpublished={\url{https://huggingface.co/Joshuant/resume-ner-distilbert}}
}

License

This model is released under the apache-2.0 license.

Acknowledgments

Dataturks for the original Resume NER dataset
Hugging Face for the transformers library and model hosting
The open-source NLP community

Model trained on 2026-01-08