YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

HR Conversations Multi-Label Classifier

A fine-tuned DistilBERT-base-uncased (66M parameters) for multi-label classification of HR support conversations.

Model Details

Attribute Value
Base Model distilbert/distilbert-base-uncased
Task Multi-label text classification
Labels 20 HR topics
Training Data 100 synthetic HR conversations
Framework Hugging Face Transformers

20 HR Topic Labels

  1. Benefits
  2. Career Development
  3. Compliance & Legal
  4. Contracts
  5. Diversity, Equity & Inclusion
  6. Expense Management
  7. Harassment
  8. Health
  9. IT & Equipment
  10. Leave & Absence
  11. Mobility
  12. Offboarding
  13. Onboarding
  14. Payroll
  15. Performance Management
  16. Recruitment
  17. Safety
  18. Timetracking
  19. Training
  20. Work Arrangements

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_id = "AurelPx/hr-conversations-classifier"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)

LABELS = [
    "Benefits", "Career Development", "Compliance & Legal", "Contracts",
    "Diversity, Equity & Inclusion", "Expense Management", "Harassment", "Health",
    "IT & Equipment", "Leave & Absence", "Mobility", "Offboarding",
    "Onboarding", "Payroll", "Performance Management", "Recruitment",
    "Safety", "Timetracking", "Training", "Work Arrangements"
]

def classify(text, threshold=0.3):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
    with torch.no_grad():
        logits = model(**inputs).logits
    probs = torch.sigmoid(logits).numpy()[0]
    return [LABELS[i] for i, p in enumerate(probs) if p >= threshold]

# Example
conversation = "USER: I haven't received my payslip for March yet..."
print(classify(conversation))  # ['Payroll']

Training Notes

  • Dataset size: 100 conversations (small dataset)
  • Split: 80 train / 20 validation
  • Epochs: 4-8 with early stopping
  • Limitations: With only 100 samples across 20 classes, the model is in a very low-data regime. For production use, collect >500 samples per label or apply data augmentation.

Links

Downloads last month
88
Safetensors
Model size
67M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support