Resume Ner Distilbert
Model Description
This is a fine-tuned Named Entity Recognition (NER) model for extracting structured information from resumes. The model is trained using Fine-tuning with LoRA on a distilbert-base-uncased backbone.
Supported Entity Types
| Entity Type | Description | Example |
|---|---|---|
| NAME | Person's name | "John Doe" |
| Email address | "john@example.com" | |
| PHONE | Phone number | "+1 555-123-4567" |
| LOCATION | Geographic location | "San Francisco, CA" |
| ORG | Organization/Company | "Google Inc." |
| TITLE | Job title | "Senior Software Engineer" |
| DEGREE | Academic degree | "Bachelor of Science in Computer Science" |
| SKILL | Technical or soft skill | "Python", "Machine Learning" |
| CERT | Certification | "AWS Solutions Architect" |
| DATE | Date or time period | "2020-2023", "January 2022" |
Training Details
Training Data
- Dataset: Dataturks Resume Entities for NER
- Source: Kaggle
- Training Examples: N/A
- Validation Examples: N/A
Training Configuration
- Base Model: distilbert-base-uncased
- Training Method: Fine-tuning with LoRA
- Epochs: 10
- Learning Rate: 3e-05
- Batch Size: 8
- Max Sequence Length: 512
- Random Seed: 42
Performance Metrics
| Metric | Validation Set |
|---|---|
| Precision | 0.2189 |
| Recall | 0.2037 |
| F1-Score | 0.2110 |
Usage
Using Transformers Pipeline
from transformers import pipeline
# Load the model
ner = pipeline("ner", model="Joshuant/resume-ner-distilbert", aggregation_strategy="simple")
# Extract entities from resume text
text = """
John Doe
Email: john.doe@email.com
Phone: +1 555-123-4567
EDUCATION
Bachelor of Science in Computer Science, MIT, 2020
EXPERIENCE
Senior Software Engineer at Google, 2020-2023
- Developed ML pipelines using Python and TensorFlow
"""
entities = ner(text)
for entity in entities:
print(f"{entity['entity_group']}: {entity['word']} (score: {entity['score']:.3f})")
Using AutoModel
from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Joshuant/resume-ner-distilbert")
model = AutoModelForTokenClassification.from_pretrained("Joshuant/resume-ner-distilbert")
# Tokenize input
text = "John Doe, Senior Software Engineer at Google"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
# Run inference
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.argmax(outputs.logits, dim=-1)
# Decode predictions
tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0])
labels = [model.config.id2label[p.item()] for p in predictions[0]]
for token, label in zip(tokens, labels):
if label != "O":
print(f"{token}: {label}")
Limitations
- The model is primarily trained on English resumes and may not perform well on other languages
- Performance may vary based on resume formatting and structure
- The model may struggle with unusual entity formats or domain-specific terminology
Citation
If you use this model in your research, please cite:
@misc{resume_ner_slm_2026,
title={Context-Aware Resume NER with Small Language Models},
author={Research Team},
year={2026},
howpublished={\url{https://huggingface.co/Joshuant/resume-ner-distilbert}}
}
License
This model is released under the apache-2.0 license.
Acknowledgments
- Dataturks for the original Resume NER dataset
- Hugging Face for the transformers library and model hosting
- The open-source NLP community
Model trained on 2026-01-08
- Downloads last month
- 4
Evaluation results
- F1 on Resume NERself-reported0.211
- Precision on Resume NERself-reported0.219
- Recall on Resume NERself-reported0.204