YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
HR Conversations Multi-Label Classifier
A fine-tuned DistilBERT-base-uncased (66M parameters) for multi-label classification of HR support conversations.
Model Details
| Attribute | Value |
|---|---|
| Base Model | distilbert/distilbert-base-uncased |
| Task | Multi-label text classification |
| Labels | 20 HR topics |
| Training Data | 100 synthetic HR conversations |
| Framework | Hugging Face Transformers |
20 HR Topic Labels
- Benefits
- Career Development
- Compliance & Legal
- Contracts
- Diversity, Equity & Inclusion
- Expense Management
- Harassment
- Health
- IT & Equipment
- Leave & Absence
- Mobility
- Offboarding
- Onboarding
- Payroll
- Performance Management
- Recruitment
- Safety
- Timetracking
- Training
- Work Arrangements
Usage
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model_id = "AurelPx/hr-conversations-classifier"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)
LABELS = [
"Benefits", "Career Development", "Compliance & Legal", "Contracts",
"Diversity, Equity & Inclusion", "Expense Management", "Harassment", "Health",
"IT & Equipment", "Leave & Absence", "Mobility", "Offboarding",
"Onboarding", "Payroll", "Performance Management", "Recruitment",
"Safety", "Timetracking", "Training", "Work Arrangements"
]
def classify(text, threshold=0.3):
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
logits = model(**inputs).logits
probs = torch.sigmoid(logits).numpy()[0]
return [LABELS[i] for i, p in enumerate(probs) if p >= threshold]
# Example
conversation = "USER: I haven't received my payslip for March yet..."
print(classify(conversation)) # ['Payroll']
Training Notes
- Dataset size: 100 conversations (small dataset)
- Split: 80 train / 20 validation
- Epochs: 4-8 with early stopping
- Limitations: With only 100 samples across 20 classes, the model is in a very low-data regime. For production use, collect >500 samples per label or apply data augmentation.
Links
- Dataset: AurelPx/ml-intern-a2d69eee-datasets
- Downloads last month
- 88
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support