Kavyaah
/

medical-coding-llm

4-bit precision

Model card Files Files and versions

medical-coding-llm / README.md

Kavyaah's picture

Update README.md

e05968f verified 4 months ago

|

history blame contribute delete

2.33 kB

	---
	license: mit
	base_model:
	- microsoft/Phi-3-mini-4k-instruct
	tags:
	- Medical
	- MedicalCoding
	- Pharma
	---
	# Medical Coding LLM

	Predict ICD-10 and CPT codes from clinical notes using a fine-tuned LLM.

	This model is fine-tuned on clinical notes using Phi-3-mini with LoRA and 4-bit quantization. It can generate both ICD/CPT codes and short explanations, helping automate the medical coding process.

	## Model Details

	Base Model: microsoft/Phi-3-mini-4k-instruct

	Fine-Tuning: LoRA (r=16, alpha=32, dropout=0.05)

	Quantization: 4-bit (BitsAndBytes NF4)

	Training Dataset: Custom dataset of clinical notes, ICD codes, and supporting evidence

	Task: Causal Language Modeling for code prediction

	## Usage
	#
	from transformers import AutoTokenizer, AutoModelForCausalLM
	import torch, re

	# Load tokenizer and model
	tokenizer = AutoTokenizer.from_pretrained("Kavyaah/medical-coding-llm")
	model = AutoModelForCausalLM.from_pretrained("Kavyaah/medical-coding-llm")
	model.eval()

	# Function to predict ICD/CPT codes
	def get_code(statement, max_new_tokens=50):
	prompt = f"Assign the correct ICD or CPT medical code for this case:\n{statement}\nCode:"
	inputs = tokenizer(prompt, return_tensors="pt")
	with torch.no_grad():
	outputs = model.generate(**inputs, max_new_tokens=max_new_tokens, do_sample=False)
	result = tokenizer.decode(outputs[0], skip_special_tokens=True)

	# Extract code using regex
	if "Code:" in result:
	result = result.split("Code:")[-1]
	match = re.search(r"\b[A-Z]\d{1,3}\.?[A-Z0-9]*\b", result)
	return match.group(0).strip() if match else result.strip()

	# Example
	statement = "Patient diagnosed with Type 2 diabetes mellitus without complications."
	print(get_code(statement))
	# Output: E11.9

	## Evaluation

	Exact match accuracy: 25%

	Semantic accuracy (ICD block match): 50%

	## Intended Use

	Assisting medical coders and healthcare professionals.

	Automating initial code suggestions from clinical notes.

	## Limitations

	Trained on a small dataset; may not cover all ICD/CPT codes.

	Use as an assistive tool, not a replacement for professional judgment.

	Always review predicted codes before clinical or billing use.


	## License

	MIT License — feel free to use and adapt for non-commercial purposes.