| # Email Classification Model - Pipeline Validated |
|
|
| A production-ready email classification model for healthcare communications. |
|
|
| ## Performance Metrics |
|
|
| **Model Performance:** |
| - Subcategory F1 (macro): 25.6% |
| - Category Accuracy: 61.6% |
| - Top-2 Accuracy: 0.0% |
|
|
| **Pipeline Validated:** |
| - Real data loading (44,602 emails) |
| - Stratified splitting |
| - Fast training convergence |
| - Dual-head classification |
| - HuggingFace deployment |
|
|
| ## Categories (19) |
| air.calendar, air.patient_info, air.practice, air.product, appointments, athelas_ehr, athelas_scribe, business_update, denials, eligibility... |
|
|
| ## Subcategories (86) |
| Top categories: appointment.incorrect_info, appointment.missing, appointment.reminder, business_update.additional_meeting_request, business_update.banking_update, business_update.billing_process_update, business_update.custom_report, business_update.edi_era_enrollments, business_update.facility_update, business_update.fee_schedule_update |
| |
| ## Usage |
| |
| ```python |
| import requests |
| |
| API_URL = "https://your-endpoint.aws.endpoints.huggingface.cloud" |
| headers = {"Authorization": "Bearer YOUR_TOKEN"} |
| |
| response = requests.post(API_URL, headers=headers, json={ |
| "inputs": "Patient appointment confirmation for tomorrow" |
| }) |
| |
| result = response.json() |
| # { |
| # "category": {"label": "appointments", "confidence": 0.95}, |
| # "subcategory": {"label": "appointments.confirmation", "confidence": 0.92} |
| # } |
| ``` |
| |
| ## Model Details |
| - Base Model: distilbert-base-uncased |
| - Training Samples: 37,911 |
| - Test Samples: 6,691 |
| - Parameters: 66,443,625 |
| - Max Length: 256 |
| |
| ## Files |
| - `model_checkpoint.pt`: Complete model checkpoint |
| - `tokenizer/`: HuggingFace tokenizer |
| - `handler.py`: Inference endpoint handler |
| |