Kavyaah commited on
Commit
b83c679
·
verified ·
1 Parent(s): 2392401

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +81 -0
README.md ADDED
@@ -0,0 +1,81 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Medical Coding LLM
2
+
3
+ Predict ICD-10 and CPT codes from clinical notes using a fine-tuned LLM.
4
+
5
+ This model is fine-tuned on clinical notes using Phi-3-mini with LoRA and 4-bit quantization. It can generate both ICD/CPT codes and short explanations, helping automate the medical coding process.
6
+
7
+ ## Model Details
8
+
9
+ Base Model: microsoft/Phi-3-mini-4k-instruct
10
+
11
+ Fine-Tuning: LoRA (r=16, alpha=32, dropout=0.05)
12
+
13
+ Quantization: 4-bit (BitsAndBytes NF4)
14
+
15
+ Training Dataset: Custom dataset of clinical notes, ICD codes, and supporting evidence
16
+
17
+ Task: Causal Language Modeling for code prediction
18
+
19
+ ## Usage
20
+
21
+ from transformers import AutoTokenizer, AutoModelForCausalLM
22
+ import torch, re
23
+
24
+ #### Load tokenizer and model
25
+ tokenizer = AutoTokenizer.from_pretrained("Kavyaah/medical-coding-llm")
26
+ model = AutoModelForCausalLM.from_pretrained("Kavyaah/medical-coding-llm")
27
+ model.eval()
28
+
29
+ #### Function to predict ICD/CPT codes
30
+ def get_code(statement, max_new_tokens=50):
31
+ prompt = f"Assign the correct ICD or CPT medical code for this case:\n{statement}\nCode:"
32
+ inputs = tokenizer(prompt, return_tensors="pt")
33
+ with torch.no_grad():
34
+ outputs = model.generate(**inputs, max_new_tokens=max_new_tokens, do_sample=False)
35
+ result = tokenizer.decode(outputs[0], skip_special_tokens=True)
36
+
37
+ # Extract code using regex
38
+ if "Code:" in result:
39
+ result = result.split("Code:")[-1]
40
+ match = re.search(r"\b[A-Z]\d{1,3}\.?[A-Z0-9]*\b", result)
41
+ return match.group(0).strip() if match else result.strip()
42
+
43
+ #### Example
44
+ statement = "Patient diagnosed with Type 2 diabetes mellitus without complications."
45
+ print(get_code(statement))
46
+ #### Output: E11.9
47
+
48
+ ## Evaluation
49
+
50
+ Tested on a small example set:
51
+
52
+ Statement True Code Predicted Code
53
+ Type 2 diabetes E11.9 E11.9
54
+ Acute bronchitis J20.0 J20.9
55
+ Routine child health exam Z00.129 99395
56
+ Essential hypertension I10 99213
57
+
58
+ Exact match accuracy: 25%
59
+
60
+ Semantic accuracy (ICD block match): 50%
61
+
62
+ Even with a small dataset, the model learned meaningful patterns and provides a foundation for scaling.
63
+
64
+ ## Intended Use
65
+
66
+ Assisting medical coders and healthcare professionals.
67
+
68
+ Automating initial code suggestions from clinical notes.
69
+
70
+ ## Limitations
71
+
72
+ Trained on a small dataset; may not cover all ICD/CPT codes.
73
+
74
+ Use as an assistive tool, not a replacement for professional judgment.
75
+
76
+ Always review predicted codes before clinical or billing use.
77
+
78
+
79
+ ## License
80
+
81
+ MIT License — feel free to use and adapt for non-commercial purposes.