sshan95
/

bioclinical-MediCoder-PROD

bioclinical_medical_coder

Model card Files Files and versions

bioclinical-MediCoder-PROD / README.md

sshan95's picture

Update README.md

a148426 verified 4 months ago

|

history blame contribute delete

2.01 kB

	# BioClinical Medical Coding Model

	## Model Description
	This is a BioClinicalModernBERT-based model for automated medical coding. The model predicts ICD-10-CM diagnosis codes and HCPCS/CPT procedure codes from clinical notes.

	## Model Architecture
	- Base Model: microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext
	- Training: 3-phase fine-tuning approach
	- Phase 1: Dense retrieval training
	- Phase 2: Hard negative re-ranking
	- Phase 3: Multi-label classification
	- Code Vocabulary: 31794 modern medical codes
	- Performance: F1-score: 0.80-0.88 on frequent codes

	## Usage

	```python
	from inference import MedicalCodingPredictor

	# Initialize predictor
	predictor = MedicalCodingPredictor()

	# Predict codes from clinical note
	clinical_note = "Patient presents with chest pain and elevated cardiac enzymes..."
	predictions = predictor.predict(clinical_note, threshold=0.5)

	for pred in predictions:
	print(f"Code: {pred['code']}")
	print(f"Type: {pred['type']}")
	print(f"Description: {pred['description']}")
	print(f"Confidence: {pred['confidence']:.3f}")
	```

	## API Response Format
	```json
	{
	"code": "I25.111",
	"type": "ICD-10-CM",
	"description": "CODE DESCRIPTION",
	"confidence": 0.85,
	"f1_score": 0.82
	}
	```

	## Files Included
	- `pytorch_model.bin`: Model weights
	- `config.json`: Model configuration
	- `code_to_idx.json`: Code to index mapping
	- `idx_to_code.json`: Index to code mapping
	- `code_descriptions.json`: Code descriptions
	- `code_f1_scores.json`: Per-code F1 scores
	- `inference.py`: Inference script
	- `requirements.txt`: Dependencies

	## Training Data
	Trained on MIMIC-IV clinical notes with temporal matching for accurate code assignment.

	## Limitations
	- Generic code descriptions (update with medical terminology database)
	- Performance varies by code frequency
	- Requires clinical validation for production use

	## Citation
	If you use this model, please cite the MIMIC-IV dataset and acknowledge the multi-stage training approach.