fine-tuned-openbiollm-medical-coding

Fine-tuned version of aaditya/Llama3-OpenBioLLM-8B for automated ICD medical coding from clinical text. This model extends OpenBioLLM's strong biomedical language understanding with task-specific fine-tuning on ICD-10 code assignment.

Model Description

This model was developed as part of a research effort to evaluate multiple biomedical LLMs on the medical coding task. OpenBioLLM-8B provides a strong foundation in biomedical language understanding (pre-trained on PubMed, clinical notes, and biomedical corpora), and this fine-tune further specializes it for structured ICD-10 output from unstructured clinical text.

Base model: aaditya/Llama3-OpenBioLLM-8B
Fine-tuning method: SFT (Supervised Fine-Tuning) via TRL
Task: ICD-10 code generation from clinical text
Domain: Clinical NLP / Healthcare AI
Parameters: ~8B

Intended Uses

Automated medical coding assistance in clinical documentation workflows
Research benchmarking of biomedical LLMs on ICD coding tasks
Integration into clinical decision support pipelines (with human oversight)

How to Use

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "abnuel/fine-tuned-openbiollm-medical-coding"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

prompt = """You are a clinical coding assistant. Given the following clinical note, 
provide the most appropriate ICD-10 code(s).

Clinical note: Patient diagnosed with essential hypertension and stage 2 chronic kidney disease.

ICD-10 Code(s):"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=64, temperature=0.1)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training Details

Fine-tuning framework: TRL (Transformer Reinforcement Learning)
Method: Supervised Fine-Tuning (SFT)
Base model: Llama3-OpenBioLLM-8B (biomedical-specialized Llama 3)
Hardware: GPU (CUDA)

Limitations

As with all LLM-based coding tools, outputs should be reviewed by a certified medical coder before use in billing or clinical records.
May not generalize to all ICD-10-CM editions, regional coding conventions, or highly specialized subspecialties.
The model does not have access to real-time coding updates or payer-specific guidelines.

Related Models & Resources

abnuel/MedGemma-4b-ICD — MedGemma-4b fine-tuned on the same task
abnuel/MedGemma-4b-ICD-Coder — companion checkpoint
🚀 Live demo: spaces/abnuel/med-coding

Citation

@misc{adegunlehin2025openbiollm-coding,
  author = {Abayomi Adegunlehin},
  title  = {Fine-tuned OpenBioLLM-8B for ICD-10 Medical Coding},
  year   = {2025},
  url    = {https://huggingface.co/abnuel/fine-tuned-openbiollm-medical-coding}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for abnuel/fine-tuned-openbiollm-medical-coding

Base model

meta-llama/Meta-Llama-3-8B

Finetuned

aaditya/Llama3-OpenBioLLM-8B

Finetuned

(6)

this model