metadata
datasets:
- dataturks/resume-entities-for-ner
language:
- en
license: apache-2.0
metrics:
- f1
- precision
- recall
model-index:
- name: resume-ner-distilbert
results:
- dataset:
name: Resume NER
type: dataturks/resume-entities-for-ner
metrics:
- name: F1
type: f1
value: 0.211
- name: Precision
type: precision
value: 0.2189
- name: Recall
type: recall
value: 0.2037
task:
name: Named Entity Recognition
type: token-classification
pipeline_tag: token-classification
tags:
- ner
- token-classification
- resume
- nlp
- transformers
- lora
Resume Ner Distilbert
Model Description
This is a fine-tuned Named Entity Recognition (NER) model for extracting structured information from resumes. The model is trained using Fine-tuning with LoRA on a distilbert-base-uncased backbone.
Supported Entity Types
| Entity Type | Description | Example |
|---|---|---|
| NAME | Person's name | "John Doe" |
| Email address | "john@example.com" | |
| PHONE | Phone number | "+1 555-123-4567" |
| LOCATION | Geographic location | "San Francisco, CA" |
| ORG | Organization/Company | "Google Inc." |
| TITLE | Job title | "Senior Software Engineer" |
| DEGREE | Academic degree | "Bachelor of Science in Computer Science" |
| SKILL | Technical or soft skill | "Python", "Machine Learning" |
| CERT | Certification | "AWS Solutions Architect" |
| DATE | Date or time period | "2020-2023", "January 2022" |
Training Details
Training Data
- Dataset: Dataturks Resume Entities for NER
- Source: Kaggle
- Training Examples: N/A
- Validation Examples: N/A
Training Configuration
- Base Model: distilbert-base-uncased
- Training Method: Fine-tuning with LoRA
- Epochs: 10
- Learning Rate: 3e-05
- Batch Size: 8
- Max Sequence Length: 512
- Random Seed: 42
Performance Metrics
| Metric | Validation Set |
|---|---|
| Precision | 0.2189 |
| Recall | 0.2037 |
| F1-Score | 0.2110 |
Usage
Using Transformers Pipeline
from transformers import pipeline
# Load the model
ner = pipeline("ner", model="Joshuant/resume-ner-distilbert", aggregation_strategy="simple")
# Extract entities from resume text
text = """
John Doe
Email: john.doe@email.com
Phone: +1 555-123-4567
EDUCATION
Bachelor of Science in Computer Science, MIT, 2020
EXPERIENCE
Senior Software Engineer at Google, 2020-2023
- Developed ML pipelines using Python and TensorFlow
"""
entities = ner(text)
for entity in entities:
print(f"{entity['entity_group']}: {entity['word']} (score: {entity['score']:.3f})")
Using AutoModel
from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Joshuant/resume-ner-distilbert")
model = AutoModelForTokenClassification.from_pretrained("Joshuant/resume-ner-distilbert")
# Tokenize input
text = "John Doe, Senior Software Engineer at Google"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
# Run inference
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.argmax(outputs.logits, dim=-1)
# Decode predictions
tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0])
labels = [model.config.id2label[p.item()] for p in predictions[0]]
for token, label in zip(tokens, labels):
if label != "O":
print(f"{token}: {label}")
Limitations
- The model is primarily trained on English resumes and may not perform well on other languages
- Performance may vary based on resume formatting and structure
- The model may struggle with unusual entity formats or domain-specific terminology
Citation
If you use this model in your research, please cite:
@misc{resume_ner_slm_2026,
title={Context-Aware Resume NER with Small Language Models},
author={Research Team},
year={2026},
howpublished={\url{https://huggingface.co/Joshuant/resume-ner-distilbert}}
}
License
This model is released under the apache-2.0 license.
Acknowledgments
- Dataturks for the original Resume NER dataset
- Hugging Face for the transformers library and model hosting
- The open-source NLP community
Model trained on 2026-01-08