Resume Ner Distilbert

Model Description

This is a fine-tuned Named Entity Recognition (NER) model for extracting structured information from resumes. The model is trained using Fine-tuning with LoRA on a distilbert-base-uncased backbone.

Supported Entity Types

Entity Type Description Example
NAME Person's name "John Doe"
EMAIL Email address "john@example.com"
PHONE Phone number "+1 555-123-4567"
LOCATION Geographic location "San Francisco, CA"
ORG Organization/Company "Google Inc."
TITLE Job title "Senior Software Engineer"
DEGREE Academic degree "Bachelor of Science in Computer Science"
SKILL Technical or soft skill "Python", "Machine Learning"
CERT Certification "AWS Solutions Architect"
DATE Date or time period "2020-2023", "January 2022"

Training Details

Training Data

  • Dataset: Dataturks Resume Entities for NER
  • Source: Kaggle
  • Training Examples: N/A
  • Validation Examples: N/A

Training Configuration

  • Base Model: distilbert-base-uncased
  • Training Method: Fine-tuning with LoRA
  • Epochs: 10
  • Learning Rate: 3e-05
  • Batch Size: 8
  • Max Sequence Length: 512
  • Random Seed: 42

Performance Metrics

Metric Validation Set
Precision 0.2189
Recall 0.2037
F1-Score 0.2110

Usage

Using Transformers Pipeline

from transformers import pipeline

# Load the model
ner = pipeline("ner", model="Joshuant/resume-ner-distilbert", aggregation_strategy="simple")

# Extract entities from resume text
text = """
John Doe
Email: john.doe@email.com
Phone: +1 555-123-4567

EDUCATION
Bachelor of Science in Computer Science, MIT, 2020

EXPERIENCE
Senior Software Engineer at Google, 2020-2023
- Developed ML pipelines using Python and TensorFlow
"""

entities = ner(text)
for entity in entities:
    print(f"{entity['entity_group']}: {entity['word']} (score: {entity['score']:.3f})")

Using AutoModel

from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Joshuant/resume-ner-distilbert")
model = AutoModelForTokenClassification.from_pretrained("Joshuant/resume-ner-distilbert")

# Tokenize input
text = "John Doe, Senior Software Engineer at Google"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)

# Run inference
with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.argmax(outputs.logits, dim=-1)

# Decode predictions
tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0])
labels = [model.config.id2label[p.item()] for p in predictions[0]]

for token, label in zip(tokens, labels):
    if label != "O":
        print(f"{token}: {label}")

Limitations

  • The model is primarily trained on English resumes and may not perform well on other languages
  • Performance may vary based on resume formatting and structure
  • The model may struggle with unusual entity formats or domain-specific terminology

Citation

If you use this model in your research, please cite:

@misc{resume_ner_slm_2026,
    title={Context-Aware Resume NER with Small Language Models},
    author={Research Team},
    year={2026},
    howpublished={\url{https://huggingface.co/Joshuant/resume-ner-distilbert}}
}

License

This model is released under the apache-2.0 license.

Acknowledgments

  • Dataturks for the original Resume NER dataset
  • Hugging Face for the transformers library and model hosting
  • The open-source NLP community

Model trained on 2026-01-08

Downloads last month
4
Safetensors
Model size
66.4M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Evaluation results