|
|
--- |
|
|
license: mit |
|
|
base_model: |
|
|
- microsoft/Phi-3-mini-4k-instruct |
|
|
tags: |
|
|
- Medical |
|
|
- MedicalCoding |
|
|
- Pharma |
|
|
--- |
|
|
# Medical Coding LLM |
|
|
|
|
|
Predict ICD-10 and CPT codes from clinical notes using a fine-tuned LLM. |
|
|
|
|
|
This model is fine-tuned on clinical notes using Phi-3-mini with LoRA and 4-bit quantization. It can generate both ICD/CPT codes and short explanations, helping automate the medical coding process. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
Base Model: microsoft/Phi-3-mini-4k-instruct |
|
|
|
|
|
Fine-Tuning: LoRA (r=16, alpha=32, dropout=0.05) |
|
|
|
|
|
Quantization: 4-bit (BitsAndBytes NF4) |
|
|
|
|
|
Training Dataset: Custom dataset of clinical notes, ICD codes, and supporting evidence |
|
|
|
|
|
Task: Causal Language Modeling for code prediction |
|
|
|
|
|
## Usage |
|
|
# |
|
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
import torch, re |
|
|
|
|
|
# Load tokenizer and model |
|
|
tokenizer = AutoTokenizer.from_pretrained("Kavyaah/medical-coding-llm") |
|
|
model = AutoModelForCausalLM.from_pretrained("Kavyaah/medical-coding-llm") |
|
|
model.eval() |
|
|
|
|
|
# Function to predict ICD/CPT codes |
|
|
def get_code(statement, max_new_tokens=50): |
|
|
prompt = f"Assign the correct ICD or CPT medical code for this case:\n{statement}\nCode:" |
|
|
inputs = tokenizer(prompt, return_tensors="pt") |
|
|
with torch.no_grad(): |
|
|
outputs = model.generate(**inputs, max_new_tokens=max_new_tokens, do_sample=False) |
|
|
result = tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
|
|
|
|
# Extract code using regex |
|
|
if "Code:" in result: |
|
|
result = result.split("Code:")[-1] |
|
|
match = re.search(r"\b[A-Z]\d{1,3}\.?[A-Z0-9]*\b", result) |
|
|
return match.group(0).strip() if match else result.strip() |
|
|
|
|
|
# Example |
|
|
statement = "Patient diagnosed with Type 2 diabetes mellitus without complications." |
|
|
print(get_code(statement)) |
|
|
# Output: E11.9 |
|
|
|
|
|
## Evaluation |
|
|
|
|
|
Exact match accuracy: 25% |
|
|
|
|
|
Semantic accuracy (ICD block match): 50% |
|
|
|
|
|
## Intended Use |
|
|
|
|
|
Assisting medical coders and healthcare professionals. |
|
|
|
|
|
Automating initial code suggestions from clinical notes. |
|
|
|
|
|
## Limitations |
|
|
|
|
|
Trained on a small dataset; may not cover all ICD/CPT codes. |
|
|
|
|
|
Use as an assistive tool, not a replacement for professional judgment. |
|
|
|
|
|
Always review predicted codes before clinical or billing use. |
|
|
|
|
|
|
|
|
## License |
|
|
|
|
|
MIT License — feel free to use and adapt for non-commercial purposes. |