# Email Classification Model - Pipeline Validated A production-ready email classification model for healthcare communications. ## Performance Metrics **Model Performance:** - Subcategory F1 (macro): 25.6% - Category Accuracy: 61.6% - Top-2 Accuracy: 0.0% **Pipeline Validated:** - Real data loading (44,602 emails) - Stratified splitting - Fast training convergence - Dual-head classification - HuggingFace deployment ## Categories (19) air.calendar, air.patient_info, air.practice, air.product, appointments, athelas_ehr, athelas_scribe, business_update, denials, eligibility... ## Subcategories (86) Top categories: appointment.incorrect_info, appointment.missing, appointment.reminder, business_update.additional_meeting_request, business_update.banking_update, business_update.billing_process_update, business_update.custom_report, business_update.edi_era_enrollments, business_update.facility_update, business_update.fee_schedule_update ## Usage ```python import requests API_URL = "https://your-endpoint.aws.endpoints.huggingface.cloud" headers = {"Authorization": "Bearer YOUR_TOKEN"} response = requests.post(API_URL, headers=headers, json={ "inputs": "Patient appointment confirmation for tomorrow" }) result = response.json() # { # "category": {"label": "appointments", "confidence": 0.95}, # "subcategory": {"label": "appointments.confirmation", "confidence": 0.92} # } ``` ## Model Details - Base Model: distilbert-base-uncased - Training Samples: 37,911 - Test Samples: 6,691 - Parameters: 66,443,625 - Max Length: 256 ## Files - `model_checkpoint.pt`: Complete model checkpoint - `tokenizer/`: HuggingFace tokenizer - `handler.py`: Inference endpoint handler