File size: 2,332 Bytes
ac434cc
 
 
 
 
 
 
 
 
b83c679
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8c97f07
 
 
b83c679
ac434cc
 
 
 
b83c679
ac434cc
 
 
 
 
b83c679
ac434cc
b83c679
 
 
 
 
 
 
8c97f07
 
 
 
b83c679
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
---
license: mit
base_model:
- microsoft/Phi-3-mini-4k-instruct
tags:
- Medical
- MedicalCoding
- Pharma
---
# Medical Coding LLM

Predict ICD-10 and CPT codes from clinical notes using a fine-tuned LLM.

This model is fine-tuned on clinical notes using Phi-3-mini with LoRA and 4-bit quantization. It can generate both ICD/CPT codes and short explanations, helping automate the medical coding process.

## Model Details

Base Model: microsoft/Phi-3-mini-4k-instruct

Fine-Tuning: LoRA (r=16, alpha=32, dropout=0.05)

Quantization: 4-bit (BitsAndBytes NF4)

Training Dataset: Custom dataset of clinical notes, ICD codes, and supporting evidence

Task: Causal Language Modeling for code prediction

## Usage
    #
    from transformers import AutoTokenizer, AutoModelForCausalLM
    import torch, re

    # Load tokenizer and model
    tokenizer = AutoTokenizer.from_pretrained("Kavyaah/medical-coding-llm")
    model = AutoModelForCausalLM.from_pretrained("Kavyaah/medical-coding-llm")
    model.eval()

    # Function to predict ICD/CPT codes
    def get_code(statement, max_new_tokens=50):
      prompt = f"Assign the correct ICD or CPT medical code for this case:\n{statement}\nCode:"
      inputs = tokenizer(prompt, return_tensors="pt")
      with torch.no_grad():
        outputs = model.generate(**inputs, max_new_tokens=max_new_tokens, do_sample=False)
        result = tokenizer.decode(outputs[0], skip_special_tokens=True)
    
    # Extract code using regex
    if "Code:" in result:
        result = result.split("Code:")[-1]
    match = re.search(r"\b[A-Z]\d{1,3}\.?[A-Z0-9]*\b", result)
    return match.group(0).strip() if match else result.strip()

    # Example
    statement = "Patient diagnosed with Type 2 diabetes mellitus without complications."
    print(get_code(statement))
    # Output: E11.9

## Evaluation

Exact match accuracy: 25%

Semantic accuracy (ICD block match): 50%

## Intended Use

Assisting medical coders and healthcare professionals.

Automating initial code suggestions from clinical notes.

## Limitations

Trained on a small dataset; may not cover all ICD/CPT codes.

Use as an assistive tool, not a replacement for professional judgment.

Always review predicted codes before clinical or billing use.


## License

MIT License — feel free to use and adapt for non-commercial purposes.