Kavyaah
/

medical-coding-llm

4-bit precision

Model card Files Files and versions

Kavyaah commited on Oct 8, 2025

Commit

b83c679

·

verified ·

1 Parent(s): 2392401

Create README.md

Files changed (1) hide show

README.md +81 -0

README.md ADDED Viewed

	@@ -0,0 +1,81 @@

+# Medical Coding LLM
+Predict ICD-10 and CPT codes from clinical notes using a fine-tuned LLM.
+This model is fine-tuned on clinical notes using Phi-3-mini with LoRA and 4-bit quantization. It can generate both ICD/CPT codes and short explanations, helping automate the medical coding process.
+## Model Details
+Base Model: microsoft/Phi-3-mini-4k-instruct
+Fine-Tuning: LoRA (r=16, alpha=32, dropout=0.05)
+Quantization: 4-bit (BitsAndBytes NF4)
+Training Dataset: Custom dataset of clinical notes, ICD codes, and supporting evidence
+Task: Causal Language Modeling for code prediction
+## Usage
+from transformers import AutoTokenizer, AutoModelForCausalLM
+import torch, re
+#### Load tokenizer and model
+tokenizer = AutoTokenizer.from_pretrained("Kavyaah/medical-coding-llm")
+model = AutoModelForCausalLM.from_pretrained("Kavyaah/medical-coding-llm")
+model.eval()
+#### Function to predict ICD/CPT codes
+def get_code(statement, max_new_tokens=50):
+    prompt = f"Assign the correct ICD or CPT medical code for this case:\n{statement}\nCode:"
+    inputs = tokenizer(prompt, return_tensors="pt")
+    with torch.no_grad():
+        outputs = model.generate(**inputs, max_new_tokens=max_new_tokens, do_sample=False)
+    result = tokenizer.decode(outputs[0], skip_special_tokens=True)
+    # Extract code using regex
+    if "Code:" in result:
+        result = result.split("Code:")[-1]
+    match = re.search(r"\b[A-Z]\d{1,3}\.?[A-Z0-9]*\b", result)
+    return match.group(0).strip() if match else result.strip()
+#### Example
+statement = "Patient diagnosed with Type 2 diabetes mellitus without complications."
+print(get_code(statement))
+#### Output: E11.9
+## Evaluation
+Tested on a small example set:
+Statement	True Code	Predicted Code
+Type 2 diabetes	E11.9	E11.9
+Acute bronchitis	J20.0	J20.9
+Routine child health exam	Z00.129	99395
+Essential hypertension	I10	99213
+Exact match accuracy: 25%
+Semantic accuracy (ICD block match): 50%
+Even with a small dataset, the model learned meaningful patterns and provides a foundation for scaling.
+## Intended Use
+Assisting medical coders and healthcare professionals.
+Automating initial code suggestions from clinical notes.
+## Limitations
+Trained on a small dataset; may not cover all ICD/CPT codes.
+Use as an assistive tool, not a replacement for professional judgment.
+Always review predicted codes before clinical or billing use.
+## License
+MIT License — feel free to use and adapt for non-commercial purposes.