| # BioClinical Medical Coding Model | |
| ## Model Description | |
| This is a BioClinicalModernBERT-based model for automated medical coding. The model predicts ICD-10-CM diagnosis codes and HCPCS/CPT procedure codes from clinical notes. | |
| ## Model Architecture | |
| - **Base Model**: microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext | |
| - **Training**: 3-phase fine-tuning approach | |
| - Phase 1: Dense retrieval training | |
| - Phase 2: Hard negative re-ranking | |
| - Phase 3: Multi-label classification | |
| - **Code Vocabulary**: 31794 modern medical codes | |
| - **Performance**: F1-score: 0.80-0.88 on frequent codes | |
| ## Usage | |
| ```python | |
| from inference import MedicalCodingPredictor | |
| # Initialize predictor | |
| predictor = MedicalCodingPredictor() | |
| # Predict codes from clinical note | |
| clinical_note = "Patient presents with chest pain and elevated cardiac enzymes..." | |
| predictions = predictor.predict(clinical_note, threshold=0.5) | |
| for pred in predictions: | |
| print(f"Code: {pred['code']}") | |
| print(f"Type: {pred['type']}") | |
| print(f"Description: {pred['description']}") | |
| print(f"Confidence: {pred['confidence']:.3f}") | |
| ``` | |
| ## API Response Format | |
| ```json | |
| { | |
| "code": "I25.111", | |
| "type": "ICD-10-CM", | |
| "description": "CODE DESCRIPTION", | |
| "confidence": 0.85, | |
| "f1_score": 0.82 | |
| } | |
| ``` | |
| ## Files Included | |
| - `pytorch_model.bin`: Model weights | |
| - `config.json`: Model configuration | |
| - `code_to_idx.json`: Code to index mapping | |
| - `idx_to_code.json`: Index to code mapping | |
| - `code_descriptions.json`: Code descriptions | |
| - `code_f1_scores.json`: Per-code F1 scores | |
| - `inference.py`: Inference script | |
| - `requirements.txt`: Dependencies | |
| ## Training Data | |
| Trained on MIMIC-IV clinical notes with temporal matching for accurate code assignment. | |
| ## Limitations | |
| - Generic code descriptions (update with medical terminology database) | |
| - Performance varies by code frequency | |
| - Requires clinical validation for production use | |
| ## Citation | |
| If you use this model, please cite the MIMIC-IV dataset and acknowledge the multi-stage training approach. | |