File size: 10,345 Bytes
b7f3196
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
# Healthcare Reason Classification System

This module implements a specialized classifier for healthcare visit reasons using real clinic data to classify patient queries into specific healthcare reason categories.

## Overview

The reason classifier addresses the challenge of routing medical healthcare queries to appropriate specialized departments. It classifies medical queries into specific reason categories based on actual healthcare visit data.

## Architecture

### Classification Categories

| Category | Description | Examples |
|----------|-------------|----------|
| `ROUTINE_CARE` | Routine healthcare, maintenance visits, general care | "I need routine foot care", "Regular nail care appointment" |
| `PAIN_CONDITIONS` | Various pain-related conditions and discomfort | "I have heel pain when I walk", "My ankle is sore" |
| `INJURIES` | Sprains, wounds, trauma-related conditions | "I sprained my ankle playing sports", "I have a wound that won't heal" |
| `SKIN_CONDITIONS` | Skin-related issues and conditions | "My toenail is ingrown and infected", "I have calluses on my feet" |
| `STRUCTURAL_ISSUES` | Structural problems and related conditions | "I have flat feet", "I need evaluation for plantar fasciitis" |
| `PROCEDURES` | Injections, surgical consultations, post-operative care | "I need a cortisone injection", "Post-surgical follow-up" |

### Technical Implementation

- **Base Model**: `sentence-transformers/embeddinggemma-300m-medical`
- **Architecture**: SetFit with frozen embeddings + trainable classification head
- **Training**: Real healthcare data from clinic appointment records
- **Integration**: Works as part of the complete healthcare routing system

## Quick Start

### 1. Train the Classifier

```bash

# Train with real healthcare data

python classifier/reason/train_reason.py



# The training script will:

# - Load real healthcare data from data/reason_for_visit_data.xlsx

# - Map reasons to categories using keyword matching

# - Train the classifier with frozen embeddings

# - Save the trained model to classifier/reason_checkpoints/

```

### 2. Use the CLI

```bash

# Classify a single reason query

python cli/reason_classifier_cli_new.py "I have heel pain when I walk"



# Interactive mode

python cli/reason_classifier_cli_new.py --interactive



# Batch processing

python cli/reason_classifier_cli_new.py --batch queries.txt --output results.json



# Use complete healthcare routing system

python cli/healthcare_classifier_cli.py "I need routine foot care"

```

### 3. Programmatic Usage

```python

from classifier.reason import ReasonClassifier, predict_single_reason



# Using the main classifier class

classifier = ReasonClassifier()

predictions = classifier.predict(["I have heel pain when I walk"])

print(predictions[0]['category'])  # Output: PAIN_CONDITIONS



# Using convenience function

result = predict_single_reason("I need routine foot care")

print(result['category'])  # Output: ROUTINE_CARE

print(result['confidence'])  # Confidence score

print(result['probabilities'])  # All category probabilities

```

## System Integration

### Complete Healthcare Routing Workflow

```

User Query

    ↓

Medical vs Insurance Classification

    ↓

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”

β”‚   Insurance     β”‚     Medical     β”‚

β”‚   Queries       β”‚     Queries     β”‚

β”‚       ↓         β”‚        ↓        β”‚

β”‚  Insurance      β”‚   Reason        β”‚

β”‚  Department     β”‚ Classification  β”‚

β”‚                 β”‚        ↓        β”‚

β”‚                 β”‚  β€’ ROUTINE_CARE β”‚

β”‚                 β”‚  β€’ PAIN_CONDITIONS β”‚

β”‚                 β”‚  β€’ INJURIES     β”‚

β”‚                 β”‚  β€’ SKIN_CONDITIONS β”‚

β”‚                 β”‚  β€’ STRUCTURAL_ISSUES β”‚

β”‚                 β”‚  β€’ PROCEDURES   β”‚

β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

```

### Integration with Healthcare System

The reason classifier integrates as part of the complete healthcare routing system:

1. **Primary Classification**: Medical vs Insurance queries
2. **Reason Classification**: Medical queries β†’ Specific reason categories
3. **Department Routing**: Route to appropriate specialized departments

## Training Data Strategy

### Real Healthcare Data

The system uses actual healthcare clinic data:

```python

# Data source: data/reason_for_visit_data.xlsx

# Contains real patient visit reasons and appointment types

# Examples from actual data:

# - "Heel pain"

# - "Routine foot care"

# - "Ingrown toenail"

# - "Ankle sprain"

# - "Plantar fasciitis"

```

### Category Mapping Strategy

The system uses keyword-based mapping to categorize real healthcare reasons:

```python

def map_reason_to_category(reason: str) -> int:

    reason_lower = reason.lower()

    

    # ROUTINE_CARE (routine care, maintenance visits)

    if any(word in reason_lower for word in ['routine', 'nail care', 'calluses']):

        return 0

    

    # PAIN_CONDITIONS (various pain-related conditions)

    elif any(word in reason_lower for word in ['pain', 'ache', 'sore']):

        return 1

    

    # ... other categories

```

## Performance Metrics

### Expected Performance
- **Accuracy**: Based on real healthcare data patterns
- **Categories**: 6 specialized healthcare reason categories
- **Confidence**: Variable based on training data quality

### Evaluation Framework

```bash

# Train and evaluate the model

python classifier/reason/train_reason.py



# Test the trained model

python classifier/reason/infer_reason.py



# Results include:

# - Training metrics

# - Category distribution

# - Example predictions with confidence scores

```

## File Structure

```

classifier/reason/

β”œβ”€β”€ __init__.py              # Package initialization and exports

β”œβ”€β”€ README.md               # This documentation

β”œβ”€β”€ reason_classifier.py    # Main ReasonClassifier class

β”œβ”€β”€ infer_reason.py        # Inference functions and utilities

└── train_reason.py        # Training script and functions

```

## API Reference

### ReasonClassifier

```python

class ReasonClassifier:

    def __init__(self, data_file: str = "data/reason_for_visit_data.xlsx")

    def predict(self, queries: List[str]) -> List[Dict]

    def train(self, train_data: pd.DataFrame = None, eval_data: Optional[pd.DataFrame] = None)

    def save_model(self, path: str)

    def load_model(self, path: str)

    def create_real_dataset(self) -> pd.DataFrame

    def analyze_real_data(self)

```

### Inference Functions

```python

def predict_single_reason(query: str) -> dict

def predict_reason_query(text: list[str], embedding_model, classifier_head) -> dict

def get_reason_models() -> tuple

def test_reason_classifier()

```

### Training Functions

```python

def get_reason_model(num_classes: int)

def get_reason_dataset() -> pd.DataFrame

def map_reason_to_category(reason: str) -> int

def preprocess_reason_data(df: pd.DataFrame) -> pd.DataFrame

```

## Data Requirements

### Healthcare Data Format

The system expects healthcare data in Excel format with these columns:

```

Required columns:

- "Reason For Visit": The primary reason for the healthcare visit

- "Appointment Type": Type of appointment (optional, used for context)



Example data:

| Reason For Visit | Appointment Type |

|------------------|------------------|

| Heel pain        | Follow-up        |

| Routine foot care| Maintenance      |

| Ingrown toenail  | New Patient      |

```

## Deployment Considerations

### Production Readiness

1. **Model Persistence**: Trained models saved with timestamps in `classifier/reason_checkpoints/`
2. **Error Handling**: Graceful fallbacks for prediction failures
3. **Real Data Integration**: Uses actual healthcare clinic data
4. **Device Support**: CPU/GPU/MPS compatibility

### Scalability

- **Batch Processing**: Efficient handling of multiple queries
- **Integration**: Works with existing healthcare routing system
- **Checkpoints**: Automatic model saving with timestamps

## Future Enhancements

### Data Improvements

1. **Expanded Dataset**: Include more healthcare specialties
2. **Active Learning**: Improve model with real-world feedback
3. **Multi-language Support**: Support for non-English healthcare queries

### Advanced Features

1. **Confidence Calibration**: Improve confidence score reliability
2. **Hierarchical Classification**: Sub-categories within reason types
3. **Context Awareness**: Consider patient history and appointment context

## Troubleshooting

### Common Issues

1. **Data Loading Errors**: Ensure `data/reason_for_visit_data.xlsx` exists
2. **Low Confidence**: May indicate need for more training data or model retraining
3. **Import Errors**: Ensure all dependencies are installed and paths are correct

### Debug Mode

```python

# Test the classifier with sample queries

from classifier.reason.infer_reason import test_reason_classifier

test_reason_classifier()



# Check model predictions with probabilities

from classifier.reason import predict_single_reason

result = predict_single_reason("ambiguous query")

print(result['probabilities'])

```

### Model Training Issues

```bash

# Check if healthcare data is available

ls -la data/reason_for_visit_data.xlsx



# Verify model training

python classifier/reason/train_reason.py



# Test inference after training

python classifier/reason/infer_reason.py

```

## Contributing

### Adding New Categories

1. Update `REASON_CATEGORIES` in `reason_classifier.py`, `infer_reason.py`, and `train_reason.py`
2. Update category mapping logic in `map_reason_to_category()`
3. Retrain the model with new categories
4. Update documentation and examples

### Improving Training Data

1. Add more real healthcare examples to the dataset
2. Improve keyword mapping for better categorization
3. Implement more sophisticated NLP techniques for category assignment

## License

This module is part of the health-query-classifier project and follows the same licensing terms.