|
|
--- |
|
|
language: en |
|
|
license: apache-2.0 |
|
|
tags: |
|
|
- text-classification |
|
|
- intent-classification |
|
|
- conversational-ai |
|
|
- bert |
|
|
- distilbert |
|
|
datasets: |
|
|
- custom |
|
|
metrics: |
|
|
- accuracy |
|
|
- f1 |
|
|
model-index: |
|
|
- name: intent-classifier-v2 |
|
|
results: |
|
|
- task: |
|
|
type: text-classification |
|
|
name: Intent Classification |
|
|
metrics: |
|
|
- type: accuracy |
|
|
value: 1.0000 |
|
|
name: Test Accuracy |
|
|
- type: f1 |
|
|
value: 1.0000 |
|
|
name: Weighted F1 |
|
|
--- |
|
|
|
|
|
# DAPA Intent Classifier v2 |
|
|
|
|
|
## Model Description |
|
|
|
|
|
This model classifies user intents for the DAPA AI conversational assistant system. It supports 13 different intents including 12 agentic workflows and 1 general Q&A fallback. |
|
|
|
|
|
**Model Type:** DistilBERT for Sequence Classification |
|
|
**Training Date:** 2025-10-25 |
|
|
**Accuracy:** 100.00% |
|
|
**F1 Score:** 1.0000 |
|
|
|
|
|
## Supported Intents |
|
|
|
|
|
The model classifies queries into 13 intents: |
|
|
|
|
|
### Agentic Intents (12) |
|
|
1. **generate-offer** - Generate job offers, NDAs, contracts |
|
|
2. **schedule-interview** - Schedule candidate interviews |
|
|
3. **update-employee-profile** - Update employee information |
|
|
4. **access-employee-record** - Access employee records |
|
|
5. **approve-expense** - Approve expense reports |
|
|
6. **check-leave-balance** - Check leave balances |
|
|
7. **confirm-training-completion** - Confirm training completion |
|
|
8. **provide-candidate-feedback** - Provide candidate feedback |
|
|
9. **request-leave** - Request time off |
|
|
10. **request-training** - Request training enrollment |
|
|
11. **review-contract** - Review contracts |
|
|
12. **submit-expense** - Submit expense reports |
|
|
|
|
|
### Q&A Intent (1) |
|
|
13. **general-query** - Generic queries (email lookups, status checks, policy questions) |
|
|
|
|
|
## Training Data |
|
|
|
|
|
- **Total Samples:** 1,240 |
|
|
- **Training Split:** 70% (868 samples) |
|
|
- **Validation Split:** 15% (186 samples) |
|
|
- **Test Split:** 15% (186 samples) |
|
|
- **Data Balance:** 80-200 examples per intent |
|
|
|
|
|
## Performance |
|
|
|
|
|
### Overall Metrics |
|
|
- **Test Accuracy:** 100.00% |
|
|
- **Weighted Precision:** 1.0000 |
|
|
- **Weighted Recall:** 1.0000 |
|
|
- **Weighted F1:** 1.0000 |
|
|
|
|
|
### Per-Intent Performance |
|
|
|
|
|
| Intent | Precision | Recall | F1 | Support | |
|
|
|--------|-----------|--------|-----|---------| |
|
|
| access-employee-record | 1.000 | 1.000 | 1.000 | 12 |\n| approve-expense | 1.000 | 1.000 | 1.000 | 12 |\n| check-leave-balance | 1.000 | 1.000 | 1.000 | 12 |\n| confirm-training-completion | 1.000 | 1.000 | 1.000 | 12 |\n| general-query | 1.000 | 1.000 | 1.000 | 30 |\n| generate-offer | 1.000 | 1.000 | 1.000 | 18 |\n| provide-candidate-feedback | 1.000 | 1.000 | 1.000 | 12 |\n| request-leave | 1.000 | 1.000 | 1.000 | 12 |\n| request-training | 1.000 | 1.000 | 1.000 | 12 |\n| review-contract | 1.000 | 1.000 | 1.000 | 12 |\n| schedule-interview | 1.000 | 1.000 | 1.000 | 15 |\n| submit-expense | 1.000 | 1.000 | 1.000 | 12 |\n| update-employee-profile | 1.000 | 1.000 | 1.000 | 15 |\n |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
|
import torch |
|
|
|
|
|
# Load model and tokenizer |
|
|
tokenizer = AutoTokenizer.from_pretrained("SantmanKT/intent-classifier-v2") |
|
|
model = AutoModelForSequenceClassification.from_pretrained("SantmanKT/intent-classifier-v2") |
|
|
|
|
|
# Predict intent |
|
|
query = "send offer to John" |
|
|
context = "{domain: hr}" |
|
|
input_text = f"{query} [context: {context}]" |
|
|
|
|
|
inputs = tokenizer(input_text, return_tensors="pt", truncation=True, max_length=128) |
|
|
outputs = model(**inputs) |
|
|
probs = torch.softmax(outputs.logits, dim=-1) |
|
|
confidence, predicted_idx = torch.max(probs, dim=-1) |
|
|
|
|
|
intent = model.config.id2label[predicted_idx.item()] |
|
|
print(f"Intent: {intent}, Confidence: {confidence.item():.2%}") |
|
|
``` |
|
|
|
|
|
## Routing Logic |
|
|
|
|
|
- **High Confidence (≥70%) + Agentic Intent** → Route to Domain Service |
|
|
- **Low Confidence (<70%) OR general-query** → Route to Q&A Service |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Base Model:** distilbert-base-uncased |
|
|
- **Max Sequence Length:** 128 tokens |
|
|
- **Training Epochs:** 5 |
|
|
- **Batch Size:** 16 |
|
|
- **Learning Rate:** 2e-5 |
|
|
- **Framework:** HuggingFace Transformers |
|
|
|
|
|
## Limitations |
|
|
|
|
|
- Optimized for English language only |
|
|
- Requires context formatting: `[context: {...}]` |
|
|
- Performance may degrade on queries significantly different from training data |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model, please cite: |
|
|
|
|
|
``` |
|
|
@misc{dapa-intent-classifier-v2, |
|
|
author = {SantmanKT}, |
|
|
title = {DAPA Intent Classifier v2}, |
|
|
year = {2025}, |
|
|
publisher = {HuggingFace}, |
|
|
url = {https://huggingface.co/SantmanKT/intent-classifier-v2} |
|
|
} |
|
|
``` |
|
|
|
|
|
## License |
|
|
|
|
|
Apache 2.0 |
|
|
|