commure-smislam's picture
Deploy validated email classification pipeline
4798a4f verified
# Email Classification Model - Pipeline Validated
A production-ready email classification model for healthcare communications.
## Performance Metrics
**Model Performance:**
- Subcategory F1 (macro): 25.6%
- Category Accuracy: 61.6%
- Top-2 Accuracy: 0.0%
**Pipeline Validated:**
- Real data loading (44,602 emails)
- Stratified splitting
- Fast training convergence
- Dual-head classification
- HuggingFace deployment
## Categories (19)
air.calendar, air.patient_info, air.practice, air.product, appointments, athelas_ehr, athelas_scribe, business_update, denials, eligibility...
## Subcategories (86)
Top categories: appointment.incorrect_info, appointment.missing, appointment.reminder, business_update.additional_meeting_request, business_update.banking_update, business_update.billing_process_update, business_update.custom_report, business_update.edi_era_enrollments, business_update.facility_update, business_update.fee_schedule_update
## Usage
```python
import requests
API_URL = "https://your-endpoint.aws.endpoints.huggingface.cloud"
headers = {"Authorization": "Bearer YOUR_TOKEN"}
response = requests.post(API_URL, headers=headers, json={
"inputs": "Patient appointment confirmation for tomorrow"
})
result = response.json()
# {
# "category": {"label": "appointments", "confidence": 0.95},
# "subcategory": {"label": "appointments.confirmation", "confidence": 0.92}
# }
```
## Model Details
- Base Model: distilbert-base-uncased
- Training Samples: 37,911
- Test Samples: 6,691
- Parameters: 66,443,625
- Max Length: 256
## Files
- `model_checkpoint.pt`: Complete model checkpoint
- `tokenizer/`: HuggingFace tokenizer
- `handler.py`: Inference endpoint handler