A newer version of the Gradio SDK is available:
6.1.0
Healthcare Reason Classification System
This module implements a specialized classifier for healthcare visit reasons using real clinic data to classify patient queries into specific healthcare reason categories.
Overview
The reason classifier addresses the challenge of routing medical healthcare queries to appropriate specialized departments. It classifies medical queries into specific reason categories based on actual healthcare visit data.
Architecture
Classification Categories
| Category | Description | Examples |
|---|---|---|
ROUTINE_CARE |
Routine healthcare, maintenance visits, general care | "I need routine foot care", "Regular nail care appointment" |
PAIN_CONDITIONS |
Various pain-related conditions and discomfort | "I have heel pain when I walk", "My ankle is sore" |
INJURIES |
Sprains, wounds, trauma-related conditions | "I sprained my ankle playing sports", "I have a wound that won't heal" |
SKIN_CONDITIONS |
Skin-related issues and conditions | "My toenail is ingrown and infected", "I have calluses on my feet" |
STRUCTURAL_ISSUES |
Structural problems and related conditions | "I have flat feet", "I need evaluation for plantar fasciitis" |
PROCEDURES |
Injections, surgical consultations, post-operative care | "I need a cortisone injection", "Post-surgical follow-up" |
Technical Implementation
- Base Model:
sentence-transformers/embeddinggemma-300m-medical - Architecture: SetFit with frozen embeddings + trainable classification head
- Training: Real healthcare data from clinic appointment records
- Integration: Works as part of the complete healthcare routing system
Quick Start
1. Train the Classifier
# Train with real healthcare data
python classifier/reason/train_reason.py
# The training script will:
# - Load real healthcare data from data/reason_for_visit_data.xlsx
# - Map reasons to categories using keyword matching
# - Train the classifier with frozen embeddings
# - Save the trained model to classifier/reason_checkpoints/
2. Use the CLI
# Classify a single reason query
python cli/reason_classifier_cli_new.py "I have heel pain when I walk"
# Interactive mode
python cli/reason_classifier_cli_new.py --interactive
# Batch processing
python cli/reason_classifier_cli_new.py --batch queries.txt --output results.json
# Use complete healthcare routing system
python cli/healthcare_classifier_cli.py "I need routine foot care"
3. Programmatic Usage
from classifier.reason import ReasonClassifier, predict_single_reason
# Using the main classifier class
classifier = ReasonClassifier()
predictions = classifier.predict(["I have heel pain when I walk"])
print(predictions[0]['category']) # Output: PAIN_CONDITIONS
# Using convenience function
result = predict_single_reason("I need routine foot care")
print(result['category']) # Output: ROUTINE_CARE
print(result['confidence']) # Confidence score
print(result['probabilities']) # All category probabilities
System Integration
Complete Healthcare Routing Workflow
User Query
β
Medical vs Insurance Classification
β
βββββββββββββββββββ¬ββββββββββββββββββ
β Insurance β Medical β
β Queries β Queries β
β β β β β
β Insurance β Reason β
β Department β Classification β
β β β β
β β β’ ROUTINE_CARE β
β β β’ PAIN_CONDITIONS β
β β β’ INJURIES β
β β β’ SKIN_CONDITIONS β
β β β’ STRUCTURAL_ISSUES β
β β β’ PROCEDURES β
βββββββββββββββββββ΄ββββββββββββββββββ
Integration with Healthcare System
The reason classifier integrates as part of the complete healthcare routing system:
- Primary Classification: Medical vs Insurance queries
- Reason Classification: Medical queries β Specific reason categories
- Department Routing: Route to appropriate specialized departments
Training Data Strategy
Real Healthcare Data
The system uses actual healthcare clinic data:
# Data source: data/reason_for_visit_data.xlsx
# Contains real patient visit reasons and appointment types
# Examples from actual data:
# - "Heel pain"
# - "Routine foot care"
# - "Ingrown toenail"
# - "Ankle sprain"
# - "Plantar fasciitis"
Category Mapping Strategy
The system uses keyword-based mapping to categorize real healthcare reasons:
def map_reason_to_category(reason: str) -> int:
reason_lower = reason.lower()
# ROUTINE_CARE (routine care, maintenance visits)
if any(word in reason_lower for word in ['routine', 'nail care', 'calluses']):
return 0
# PAIN_CONDITIONS (various pain-related conditions)
elif any(word in reason_lower for word in ['pain', 'ache', 'sore']):
return 1
# ... other categories
Performance Metrics
Expected Performance
- Accuracy: Based on real healthcare data patterns
- Categories: 6 specialized healthcare reason categories
- Confidence: Variable based on training data quality
Evaluation Framework
# Train and evaluate the model
python classifier/reason/train_reason.py
# Test the trained model
python classifier/reason/infer_reason.py
# Results include:
# - Training metrics
# - Category distribution
# - Example predictions with confidence scores
File Structure
classifier/reason/
βββ __init__.py # Package initialization and exports
βββ README.md # This documentation
βββ reason_classifier.py # Main ReasonClassifier class
βββ infer_reason.py # Inference functions and utilities
βββ train_reason.py # Training script and functions
API Reference
ReasonClassifier
class ReasonClassifier:
def __init__(self, data_file: str = "data/reason_for_visit_data.xlsx")
def predict(self, queries: List[str]) -> List[Dict]
def train(self, train_data: pd.DataFrame = None, eval_data: Optional[pd.DataFrame] = None)
def save_model(self, path: str)
def load_model(self, path: str)
def create_real_dataset(self) -> pd.DataFrame
def analyze_real_data(self)
Inference Functions
def predict_single_reason(query: str) -> dict
def predict_reason_query(text: list[str], embedding_model, classifier_head) -> dict
def get_reason_models() -> tuple
def test_reason_classifier()
Training Functions
def get_reason_model(num_classes: int)
def get_reason_dataset() -> pd.DataFrame
def map_reason_to_category(reason: str) -> int
def preprocess_reason_data(df: pd.DataFrame) -> pd.DataFrame
Data Requirements
Healthcare Data Format
The system expects healthcare data in Excel format with these columns:
Required columns:
- "Reason For Visit": The primary reason for the healthcare visit
- "Appointment Type": Type of appointment (optional, used for context)
Example data:
| Reason For Visit | Appointment Type |
|------------------|------------------|
| Heel pain | Follow-up |
| Routine foot care| Maintenance |
| Ingrown toenail | New Patient |
Deployment Considerations
Production Readiness
- Model Persistence: Trained models saved with timestamps in
classifier/reason_checkpoints/ - Error Handling: Graceful fallbacks for prediction failures
- Real Data Integration: Uses actual healthcare clinic data
- Device Support: CPU/GPU/MPS compatibility
Scalability
- Batch Processing: Efficient handling of multiple queries
- Integration: Works with existing healthcare routing system
- Checkpoints: Automatic model saving with timestamps
Future Enhancements
Data Improvements
- Expanded Dataset: Include more healthcare specialties
- Active Learning: Improve model with real-world feedback
- Multi-language Support: Support for non-English healthcare queries
Advanced Features
- Confidence Calibration: Improve confidence score reliability
- Hierarchical Classification: Sub-categories within reason types
- Context Awareness: Consider patient history and appointment context
Troubleshooting
Common Issues
- Data Loading Errors: Ensure
data/reason_for_visit_data.xlsxexists - Low Confidence: May indicate need for more training data or model retraining
- Import Errors: Ensure all dependencies are installed and paths are correct
Debug Mode
# Test the classifier with sample queries
from classifier.reason.infer_reason import test_reason_classifier
test_reason_classifier()
# Check model predictions with probabilities
from classifier.reason import predict_single_reason
result = predict_single_reason("ambiguous query")
print(result['probabilities'])
Model Training Issues
# Check if healthcare data is available
ls -la data/reason_for_visit_data.xlsx
# Verify model training
python classifier/reason/train_reason.py
# Test inference after training
python classifier/reason/infer_reason.py
Contributing
Adding New Categories
- Update
REASON_CATEGORIESinreason_classifier.py,infer_reason.py, andtrain_reason.py - Update category mapping logic in
map_reason_to_category() - Retrain the model with new categories
- Update documentation and examples
Improving Training Data
- Add more real healthcare examples to the dataset
- Improve keyword mapping for better categorization
- Implement more sophisticated NLP techniques for category assignment
License
This module is part of the health-query-classifier project and follows the same licensing terms.