| # HR Conversations Multi-Label Classifier |
|
|
| A fine-tuned **DistilBERT-base-uncased** (66M parameters) for multi-label classification of HR support conversations. |
|
|
| ## Model Details |
|
|
| | Attribute | Value | |
| |-----------|-------| |
| | Base Model | `distilbert/distilbert-base-uncased` | |
| | Task | Multi-label text classification | |
| | Labels | 20 HR topics | |
| | Training Data | 100 synthetic HR conversations | |
| | Framework | Hugging Face Transformers | |
|
|
| ## 20 HR Topic Labels |
|
|
| 1. Benefits |
| 2. Career Development |
| 3. Compliance & Legal |
| 4. Contracts |
| 5. Diversity, Equity & Inclusion |
| 6. Expense Management |
| 7. Harassment |
| 8. Health |
| 9. IT & Equipment |
| 10. Leave & Absence |
| 11. Mobility |
| 12. Offboarding |
| 13. Onboarding |
| 14. Payroll |
| 15. Performance Management |
| 16. Recruitment |
| 17. Safety |
| 18. Timetracking |
| 19. Training |
| 20. Work Arrangements |
|
|
| ## Usage |
|
|
| ```python |
| from transformers import AutoTokenizer, AutoModelForSequenceClassification |
| import torch |
| |
| model_id = "AurelPx/hr-conversations-classifier" |
| tokenizer = AutoTokenizer.from_pretrained(model_id) |
| model = AutoModelForSequenceClassification.from_pretrained(model_id) |
| |
| LABELS = [ |
| "Benefits", "Career Development", "Compliance & Legal", "Contracts", |
| "Diversity, Equity & Inclusion", "Expense Management", "Harassment", "Health", |
| "IT & Equipment", "Leave & Absence", "Mobility", "Offboarding", |
| "Onboarding", "Payroll", "Performance Management", "Recruitment", |
| "Safety", "Timetracking", "Training", "Work Arrangements" |
| ] |
| |
| def classify(text, threshold=0.3): |
| inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512) |
| with torch.no_grad(): |
| logits = model(**inputs).logits |
| probs = torch.sigmoid(logits).numpy()[0] |
| return [LABELS[i] for i, p in enumerate(probs) if p >= threshold] |
| |
| # Example |
| conversation = "USER: I haven't received my payslip for March yet..." |
| print(classify(conversation)) # ['Payroll'] |
| ``` |
|
|
| ## Training Notes |
|
|
| - **Dataset size**: 100 conversations (small dataset) |
| - **Split**: 80 train / 20 validation |
| - **Epochs**: 4-8 with early stopping |
| - **Limitations**: With only 100 samples across 20 classes, the model is in a very low-data regime. For production use, collect >500 samples per label or apply data augmentation. |
|
|
| ## Links |
|
|
| - Dataset: [AurelPx/ml-intern-a2d69eee-datasets](https://huggingface.co/datasets/AurelPx/ml-intern-a2d69eee-datasets) |
|
|