File size: 4,989 Bytes
ab8f0b2 38aa79f ab8f0b2 38aa79f ab8f0b2 38aa79f ab8f0b2 38aa79f ab8f0b2 38aa79f ab8f0b2 38aa79f ab8f0b2 38aa79f ab8f0b2 38aa79f ab8f0b2 38aa79f ab8f0b2 38aa79f ab8f0b2 38aa79f ab8f0b2 38aa79f ab8f0b2 38aa79f ab8f0b2 38aa79f ab8f0b2 38aa79f ab8f0b2 38aa79f ab8f0b2 38aa79f ab8f0b2 38aa79f ab8f0b2 38aa79f 6cab3fb | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 | ---
license: llama3.2
base_model: meta-llama/Llama-3.2-3B-Instruct
tags:
- medical
- clinical-reasoning
- diagnostic
- education
- fine-tuned
- lora
- sft
- trl
datasets:
- mimic-iv-ext-direct
language:
- en
pipeline_tag: text-generation
---
# Clinical Reasoning Model (Test 1)
A fine-tuned version of [Llama 3.2 3B Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) trained to produce step-by-step diagnostic reasoning chains from clinical patient cases.
## Purpose
This model was created for **educational purposes only**. It is designed to demonstrate how a language model can walk through the clinical reasoning process, connecting patient findings (history, physical exam, labs, imaging) to a final diagnosis in a structured, step-by-step format.
**This model is NOT intended for clinical use, patient care, or medical decision-making.**
## What It Does
Given a patient case (chief complaint, history, exam findings, labs, and imaging), the model produces:
1. A final diagnosis
2. A numbered reasoning chain that explains how each piece of clinical evidence supports or leads to that diagnosis
### Example
**Input:**
> A patient presents with productive cough, fatigue, and chest congestion. History of prior TB treatment. Chest CT shows a thin-walled cavity in the right lower lobe with adjacent calcified granulomas and bronchiectasis.
**Output:**
> FINAL DIAGNOSIS: Tuberculosis
>
> Step 1: Cavities in the lungs are common in active tuberculosis, especially when the walls of the cavities are thin, indicating the possibility of active disease or reactivation of infection.
> Supporting evidence: Superior segment right lower lobe relatively thin-walled cavity
>
> Step 2: The patient had been treated for tuberculosis several years earlier, which is important background information because tuberculosis can recur.
> Supporting evidence: TB treated years ago
>
> Step 3: In patients with a history of tuberculosis, these symptoms may indicate activity or recurrence of tuberculosis.
> Supporting evidence: symptoms of URI including fatigue, productive cough, runny nose, and chest congestion
## Training Details
### Dataset
Trained on the [DiReCT (Diagnostic Reasoning for Clinical Notes)](https://physionet.org/content/mimic-iv-ext-direct/1.0.0/) dataset, which contains 511 clinical notes sourced from MIMIC-IV. Each note was annotated by physicians with structured diagnostic reasoning trees mapping clinical observations to final diagnoses.
The dataset covers 25 disease categories and 73 unique diagnoses, including:
- Acute Coronary Syndrome (NSTEMI, Unstable Angina)
- Heart Failure (HFrEF, HFpEF)
- Stroke (Hemorrhagic, Ischemic)
- Pulmonary Embolism
- Pneumonia
- COPD
- Multiple Sclerosis
- Tuberculosis
- Hypertension
- And many more
### Training Configuration
| Parameter | Value |
|---|---|
| Base model | meta-llama/Llama-3.2-3B-Instruct |
| Method | SFT with LoRA (PEFT) |
| Quantization | 4-bit (NF4) |
| LoRA rank | 16 |
| LoRA alpha | 32 |
| LoRA dropout | 0.05 |
| Learning rate | 3e-5 |
| Epochs | 3 |
| Batch size | 1 (effective 8 with gradient accumulation) |
| Precision | FP16 |
| Hardware | NVIDIA T4 (Google Colab) |
### Training Results
The model trained for 3 epochs with a steady decrease in loss:
| Step | Training Loss |
|---|---|
| 10 | 22.38 |
| 30 | 19.23 |
| 50 | 17.03 |
| 70 | 15.23 |
| 90 | 15.08 |
| 110 | 15.07 |
| 130 | 14.57 |
| 150 | 13.90 |
| 170 | 14.35 |
| 180 | 13.71 |
## Limitations
- **Not for clinical use.** This model is an educational experiment and should never be used for actual patient care or medical decision-making.
- **Small training set.** 511 cases is a modest dataset for fine-tuning. The model may not generalize well to diseases or presentations not represented in the training data.
- **Small base model.** Llama 3.2 3B is a relatively small model. Larger models would likely produce better reasoning.
- **Biases.** The training data comes from a single institution (MIMIC-IV / Beth Israel Deaconess Medical Center), so the model may reflect that institution's patient population and clinical practices.
- **Hallucination risk.** Like all language models, this model can generate plausible-sounding but incorrect medical reasoning.
## Citation
If you use this model, please cite the DiReCT dataset:
```bibtex
@article{wang2024direct,
title={DiReCT: Diagnostic Reasoning for Clinical Notes via Large Language Models},
author={Wang, Bowen and Chang, Jiuyang and Qian, Yiming and others},
journal={arXiv preprint arXiv:2408.01933},
year={2024}
}
```
```bibtex
@article{PhysioNet-mimic-iv-ext-direct-1.0.0,
author = {Wang, Bowen and Chang, Jiuyang and Qian, Yiming},
title = {{MIMIC-IV-Ext-DiReCT}},
journal = {{PhysioNet}},
year = {2025},
doi = {10.13026/yf96-kc87}
}
```
## Contact
This model was created as a learning exercise in fine-tuning language models for medical education applications.
Created by Arman Yalcin
www.linkedin.com/in/arman8514581 |