File size: 4,452 Bytes
61d8207 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 |
---
language: en
license: apache-2.0
tags:
- text-classification
- intent-classification
- conversational-ai
- bert
- distilbert
datasets:
- custom
metrics:
- accuracy
- f1
model-index:
- name: intent-classifier-v2
results:
- task:
type: text-classification
name: Intent Classification
metrics:
- type: accuracy
value: 1.0000
name: Test Accuracy
- type: f1
value: 1.0000
name: Weighted F1
---
# DAPA Intent Classifier v2
## Model Description
This model classifies user intents for the DAPA AI conversational assistant system. It supports 13 different intents including 12 agentic workflows and 1 general Q&A fallback.
**Model Type:** DistilBERT for Sequence Classification
**Training Date:** 2025-10-25
**Accuracy:** 100.00%
**F1 Score:** 1.0000
## Supported Intents
The model classifies queries into 13 intents:
### Agentic Intents (12)
1. **generate-offer** - Generate job offers, NDAs, contracts
2. **schedule-interview** - Schedule candidate interviews
3. **update-employee-profile** - Update employee information
4. **access-employee-record** - Access employee records
5. **approve-expense** - Approve expense reports
6. **check-leave-balance** - Check leave balances
7. **confirm-training-completion** - Confirm training completion
8. **provide-candidate-feedback** - Provide candidate feedback
9. **request-leave** - Request time off
10. **request-training** - Request training enrollment
11. **review-contract** - Review contracts
12. **submit-expense** - Submit expense reports
### Q&A Intent (1)
13. **general-query** - Generic queries (email lookups, status checks, policy questions)
## Training Data
- **Total Samples:** 1,240
- **Training Split:** 70% (868 samples)
- **Validation Split:** 15% (186 samples)
- **Test Split:** 15% (186 samples)
- **Data Balance:** 80-200 examples per intent
## Performance
### Overall Metrics
- **Test Accuracy:** 100.00%
- **Weighted Precision:** 1.0000
- **Weighted Recall:** 1.0000
- **Weighted F1:** 1.0000
### Per-Intent Performance
| Intent | Precision | Recall | F1 | Support |
|--------|-----------|--------|-----|---------|
| access-employee-record | 1.000 | 1.000 | 1.000 | 12 |\n| approve-expense | 1.000 | 1.000 | 1.000 | 12 |\n| check-leave-balance | 1.000 | 1.000 | 1.000 | 12 |\n| confirm-training-completion | 1.000 | 1.000 | 1.000 | 12 |\n| general-query | 1.000 | 1.000 | 1.000 | 30 |\n| generate-offer | 1.000 | 1.000 | 1.000 | 18 |\n| provide-candidate-feedback | 1.000 | 1.000 | 1.000 | 12 |\n| request-leave | 1.000 | 1.000 | 1.000 | 12 |\n| request-training | 1.000 | 1.000 | 1.000 | 12 |\n| review-contract | 1.000 | 1.000 | 1.000 | 12 |\n| schedule-interview | 1.000 | 1.000 | 1.000 | 15 |\n| submit-expense | 1.000 | 1.000 | 1.000 | 12 |\n| update-employee-profile | 1.000 | 1.000 | 1.000 | 15 |\n
## Usage
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("SantmanKT/intent-classifier-v2")
model = AutoModelForSequenceClassification.from_pretrained("SantmanKT/intent-classifier-v2")
# Predict intent
query = "send offer to John"
context = "{domain: hr}"
input_text = f"{query} [context: {context}]"
inputs = tokenizer(input_text, return_tensors="pt", truncation=True, max_length=128)
outputs = model(**inputs)
probs = torch.softmax(outputs.logits, dim=-1)
confidence, predicted_idx = torch.max(probs, dim=-1)
intent = model.config.id2label[predicted_idx.item()]
print(f"Intent: {intent}, Confidence: {confidence.item():.2%}")
```
## Routing Logic
- **High Confidence (≥70%) + Agentic Intent** → Route to Domain Service
- **Low Confidence (<70%) OR general-query** → Route to Q&A Service
## Model Details
- **Base Model:** distilbert-base-uncased
- **Max Sequence Length:** 128 tokens
- **Training Epochs:** 5
- **Batch Size:** 16
- **Learning Rate:** 2e-5
- **Framework:** HuggingFace Transformers
## Limitations
- Optimized for English language only
- Requires context formatting: `[context: {...}]`
- Performance may degrade on queries significantly different from training data
## Citation
If you use this model, please cite:
```
@misc{dapa-intent-classifier-v2,
author = {SantmanKT},
title = {DAPA Intent Classifier v2},
year = {2025},
publisher = {HuggingFace},
url = {https://huggingface.co/SantmanKT/intent-classifier-v2}
}
```
## License
Apache 2.0
|