taraky's picture
Upload folder using huggingface_hub
b7f3196 verified

A newer version of the Gradio SDK is available: 6.1.0

Upgrade

Healthcare Reason Classification System

This module implements a specialized classifier for healthcare visit reasons using real clinic data to classify patient queries into specific healthcare reason categories.

Overview

The reason classifier addresses the challenge of routing medical healthcare queries to appropriate specialized departments. It classifies medical queries into specific reason categories based on actual healthcare visit data.

Architecture

Classification Categories

Category Description Examples
ROUTINE_CARE Routine healthcare, maintenance visits, general care "I need routine foot care", "Regular nail care appointment"
PAIN_CONDITIONS Various pain-related conditions and discomfort "I have heel pain when I walk", "My ankle is sore"
INJURIES Sprains, wounds, trauma-related conditions "I sprained my ankle playing sports", "I have a wound that won't heal"
SKIN_CONDITIONS Skin-related issues and conditions "My toenail is ingrown and infected", "I have calluses on my feet"
STRUCTURAL_ISSUES Structural problems and related conditions "I have flat feet", "I need evaluation for plantar fasciitis"
PROCEDURES Injections, surgical consultations, post-operative care "I need a cortisone injection", "Post-surgical follow-up"

Technical Implementation

  • Base Model: sentence-transformers/embeddinggemma-300m-medical
  • Architecture: SetFit with frozen embeddings + trainable classification head
  • Training: Real healthcare data from clinic appointment records
  • Integration: Works as part of the complete healthcare routing system

Quick Start

1. Train the Classifier

# Train with real healthcare data
python classifier/reason/train_reason.py

# The training script will:
# - Load real healthcare data from data/reason_for_visit_data.xlsx
# - Map reasons to categories using keyword matching
# - Train the classifier with frozen embeddings
# - Save the trained model to classifier/reason_checkpoints/

2. Use the CLI

# Classify a single reason query
python cli/reason_classifier_cli_new.py "I have heel pain when I walk"

# Interactive mode
python cli/reason_classifier_cli_new.py --interactive

# Batch processing
python cli/reason_classifier_cli_new.py --batch queries.txt --output results.json

# Use complete healthcare routing system
python cli/healthcare_classifier_cli.py "I need routine foot care"

3. Programmatic Usage

from classifier.reason import ReasonClassifier, predict_single_reason

# Using the main classifier class
classifier = ReasonClassifier()
predictions = classifier.predict(["I have heel pain when I walk"])
print(predictions[0]['category'])  # Output: PAIN_CONDITIONS

# Using convenience function
result = predict_single_reason("I need routine foot care")
print(result['category'])  # Output: ROUTINE_CARE
print(result['confidence'])  # Confidence score
print(result['probabilities'])  # All category probabilities

System Integration

Complete Healthcare Routing Workflow

User Query
    ↓
Medical vs Insurance Classification
    ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Insurance     β”‚     Medical     β”‚
β”‚   Queries       β”‚     Queries     β”‚
β”‚       ↓         β”‚        ↓        β”‚
β”‚  Insurance      β”‚   Reason        β”‚
β”‚  Department     β”‚ Classification  β”‚
β”‚                 β”‚        ↓        β”‚
β”‚                 β”‚  β€’ ROUTINE_CARE β”‚
β”‚                 β”‚  β€’ PAIN_CONDITIONS β”‚
β”‚                 β”‚  β€’ INJURIES     β”‚
β”‚                 β”‚  β€’ SKIN_CONDITIONS β”‚
β”‚                 β”‚  β€’ STRUCTURAL_ISSUES β”‚
β”‚                 β”‚  β€’ PROCEDURES   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Integration with Healthcare System

The reason classifier integrates as part of the complete healthcare routing system:

  1. Primary Classification: Medical vs Insurance queries
  2. Reason Classification: Medical queries β†’ Specific reason categories
  3. Department Routing: Route to appropriate specialized departments

Training Data Strategy

Real Healthcare Data

The system uses actual healthcare clinic data:

# Data source: data/reason_for_visit_data.xlsx
# Contains real patient visit reasons and appointment types
# Examples from actual data:
# - "Heel pain"
# - "Routine foot care"
# - "Ingrown toenail"
# - "Ankle sprain"
# - "Plantar fasciitis"

Category Mapping Strategy

The system uses keyword-based mapping to categorize real healthcare reasons:

def map_reason_to_category(reason: str) -> int:
    reason_lower = reason.lower()
    
    # ROUTINE_CARE (routine care, maintenance visits)
    if any(word in reason_lower for word in ['routine', 'nail care', 'calluses']):
        return 0
    
    # PAIN_CONDITIONS (various pain-related conditions)
    elif any(word in reason_lower for word in ['pain', 'ache', 'sore']):
        return 1
    
    # ... other categories

Performance Metrics

Expected Performance

  • Accuracy: Based on real healthcare data patterns
  • Categories: 6 specialized healthcare reason categories
  • Confidence: Variable based on training data quality

Evaluation Framework

# Train and evaluate the model
python classifier/reason/train_reason.py

# Test the trained model
python classifier/reason/infer_reason.py

# Results include:
# - Training metrics
# - Category distribution
# - Example predictions with confidence scores

File Structure

classifier/reason/
β”œβ”€β”€ __init__.py              # Package initialization and exports
β”œβ”€β”€ README.md               # This documentation
β”œβ”€β”€ reason_classifier.py    # Main ReasonClassifier class
β”œβ”€β”€ infer_reason.py        # Inference functions and utilities
└── train_reason.py        # Training script and functions

API Reference

ReasonClassifier

class ReasonClassifier:
    def __init__(self, data_file: str = "data/reason_for_visit_data.xlsx")
    def predict(self, queries: List[str]) -> List[Dict]
    def train(self, train_data: pd.DataFrame = None, eval_data: Optional[pd.DataFrame] = None)
    def save_model(self, path: str)
    def load_model(self, path: str)
    def create_real_dataset(self) -> pd.DataFrame
    def analyze_real_data(self)

Inference Functions

def predict_single_reason(query: str) -> dict
def predict_reason_query(text: list[str], embedding_model, classifier_head) -> dict
def get_reason_models() -> tuple
def test_reason_classifier()

Training Functions

def get_reason_model(num_classes: int)
def get_reason_dataset() -> pd.DataFrame
def map_reason_to_category(reason: str) -> int
def preprocess_reason_data(df: pd.DataFrame) -> pd.DataFrame

Data Requirements

Healthcare Data Format

The system expects healthcare data in Excel format with these columns:

Required columns:
- "Reason For Visit": The primary reason for the healthcare visit
- "Appointment Type": Type of appointment (optional, used for context)

Example data:
| Reason For Visit | Appointment Type |
|------------------|------------------|
| Heel pain        | Follow-up        |
| Routine foot care| Maintenance      |
| Ingrown toenail  | New Patient      |

Deployment Considerations

Production Readiness

  1. Model Persistence: Trained models saved with timestamps in classifier/reason_checkpoints/
  2. Error Handling: Graceful fallbacks for prediction failures
  3. Real Data Integration: Uses actual healthcare clinic data
  4. Device Support: CPU/GPU/MPS compatibility

Scalability

  • Batch Processing: Efficient handling of multiple queries
  • Integration: Works with existing healthcare routing system
  • Checkpoints: Automatic model saving with timestamps

Future Enhancements

Data Improvements

  1. Expanded Dataset: Include more healthcare specialties
  2. Active Learning: Improve model with real-world feedback
  3. Multi-language Support: Support for non-English healthcare queries

Advanced Features

  1. Confidence Calibration: Improve confidence score reliability
  2. Hierarchical Classification: Sub-categories within reason types
  3. Context Awareness: Consider patient history and appointment context

Troubleshooting

Common Issues

  1. Data Loading Errors: Ensure data/reason_for_visit_data.xlsx exists
  2. Low Confidence: May indicate need for more training data or model retraining
  3. Import Errors: Ensure all dependencies are installed and paths are correct

Debug Mode

# Test the classifier with sample queries
from classifier.reason.infer_reason import test_reason_classifier
test_reason_classifier()

# Check model predictions with probabilities
from classifier.reason import predict_single_reason
result = predict_single_reason("ambiguous query")
print(result['probabilities'])

Model Training Issues

# Check if healthcare data is available
ls -la data/reason_for_visit_data.xlsx

# Verify model training
python classifier/reason/train_reason.py

# Test inference after training
python classifier/reason/infer_reason.py

Contributing

Adding New Categories

  1. Update REASON_CATEGORIES in reason_classifier.py, infer_reason.py, and train_reason.py
  2. Update category mapping logic in map_reason_to_category()
  3. Retrain the model with new categories
  4. Update documentation and examples

Improving Training Data

  1. Add more real healthcare examples to the dataset
  2. Improve keyword mapping for better categorization
  3. Implement more sophisticated NLP techniques for category assignment

License

This module is part of the health-query-classifier project and follows the same licensing terms.