Resume Ner Distilbert

Model Description

This is a fine-tuned Named Entity Recognition (NER) model for extracting structured information from resumes. The model is trained using Fine-tuning with LoRA on a distilbert-base-uncased backbone.

Supported Entity Types

Entity Type	Description	Example
NAME	Person's name	"John Doe"
EMAIL	Email address	"john@example.com"
PHONE	Phone number	"+1 555-123-4567"
LOCATION	Geographic location	"San Francisco, CA"
ORG	Organization/Company	"Google Inc."
TITLE	Job title	"Senior Software Engineer"
DEGREE	Academic degree	"Bachelor of Science in Computer Science"
SKILL	Technical or soft skill	"Python", "Machine Learning"
CERT	Certification	"AWS Solutions Architect"
DATE	Date or time period	"2020-2023", "January 2022"

Training Details

Training Data

Dataset: Dataturks Resume Entities for NER
Source: Kaggle
Training Examples: N/A
Validation Examples: N/A

Training Configuration

Base Model: distilbert-base-uncased
Training Method: Fine-tuning with LoRA
Epochs: 10
Learning Rate: 3e-05
Batch Size: 8
Max Sequence Length: 512
Random Seed: 42

Performance Metrics

Metric	Validation Set
Precision	0.2189
Recall	0.2037
F1-Score	0.2110

Usage

Using Transformers Pipeline

from transformers import pipeline

# Load the model
ner = pipeline("ner", model="Joshuant/resume-ner-distilbert", aggregation_strategy="simple")

# Extract entities from resume text
text = """
John Doe
Email: john.doe@email.com
Phone: +1 555-123-4567

EDUCATION
Bachelor of Science in Computer Science, MIT, 2020

EXPERIENCE
Senior Software Engineer at Google, 2020-2023
- Developed ML pipelines using Python and TensorFlow
"""

entities = ner(text)
for entity in entities:
    print(f"{entity['entity_group']}: {entity['word']} (score: {entity['score']:.3f})")

Using AutoModel

from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Joshuant/resume-ner-distilbert")
model = AutoModelForTokenClassification.from_pretrained("Joshuant/resume-ner-distilbert")

# Tokenize input
text = "John Doe, Senior Software Engineer at Google"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)

# Run inference
with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.argmax(outputs.logits, dim=-1)

# Decode predictions
tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0])
labels = [model.config.id2label[p.item()] for p in predictions[0]]

for token, label in zip(tokens, labels):
    if label != "O":
        print(f"{token}: {label}")

Limitations

The model is primarily trained on English resumes and may not perform well on other languages
Performance may vary based on resume formatting and structure
The model may struggle with unusual entity formats or domain-specific terminology

Citation

If you use this model in your research, please cite:

@misc{resume_ner_slm_2026,
    title={Context-Aware Resume NER with Small Language Models},
    author={Research Team},
    year={2026},
    howpublished={\url{https://huggingface.co/Joshuant/resume-ner-distilbert}}
}

License

This model is released under the apache-2.0 license.

Acknowledgments

Dataturks for the original Resume NER dataset
Hugging Face for the transformers library and model hosting
The open-source NLP community

Model trained on 2026-01-08

Downloads last month: 6

Safetensors

Model size

66.4M params

Tensor type

F32

Evaluation results

F1 on Resume NER
self-reported

0.211
Precision on Resume NER
self-reported

0.219
Recall on Resume NER
self-reported

0.204