---
license: llama3.2
base_model: meta-llama/Llama-3.2-3B-Instruct
tags:
- medical
- clinical-reasoning
- diagnostic
- education
- fine-tuned
- lora
- sft
- trl
datasets:
- mimic-iv-ext-direct
language:
- en
pipeline_tag: text-generation
---

# Clinical Reasoning Model (Test 1)

A fine-tuned version of [Llama 3.2 3B Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) trained to produce step-by-step diagnostic reasoning chains from clinical patient cases.

## Purpose

This model was created for **educational purposes only**. It is designed to demonstrate how a language model can walk through the clinical reasoning process, connecting patient findings (history, physical exam, labs, imaging) to a final diagnosis in a structured, step-by-step format.

**This model is NOT intended for clinical use, patient care, or medical decision-making.**

## What It Does

Given a patient case (chief complaint, history, exam findings, labs, and imaging), the model produces:

1. A final diagnosis
2. A numbered reasoning chain that explains how each piece of clinical evidence supports or leads to that diagnosis

### Example

**Input:**
> A patient presents with productive cough, fatigue, and chest congestion. History of prior TB treatment. Chest CT shows a thin-walled cavity in the right lower lobe with adjacent calcified granulomas and bronchiectasis.

**Output:**
> FINAL DIAGNOSIS: Tuberculosis
>
> Step 1: Cavities in the lungs are common in active tuberculosis, especially when the walls of the cavities are thin, indicating the possibility of active disease or reactivation of infection.
> Supporting evidence: Superior segment right lower lobe relatively thin-walled cavity
>
> Step 2: The patient had been treated for tuberculosis several years earlier, which is important background information because tuberculosis can recur.
> Supporting evidence: TB treated years ago
>
> Step 3: In patients with a history of tuberculosis, these symptoms may indicate activity or recurrence of tuberculosis.
> Supporting evidence: symptoms of URI including fatigue, productive cough, runny nose, and chest congestion

## Training Details

### Dataset

Trained on the [DiReCT (Diagnostic Reasoning for Clinical Notes)](https://physionet.org/content/mimic-iv-ext-direct/1.0.0/) dataset, which contains 511 clinical notes sourced from MIMIC-IV. Each note was annotated by physicians with structured diagnostic reasoning trees mapping clinical observations to final diagnoses.

The dataset covers 25 disease categories and 73 unique diagnoses, including:

- Acute Coronary Syndrome (NSTEMI, Unstable Angina)
- Heart Failure (HFrEF, HFpEF)
- Stroke (Hemorrhagic, Ischemic)
- Pulmonary Embolism
- Pneumonia
- COPD
- Multiple Sclerosis
- Tuberculosis
- Hypertension
- And many more

### Training Configuration

| Parameter | Value |
|---|---|
| Base model | meta-llama/Llama-3.2-3B-Instruct |
| Method | SFT with LoRA (PEFT) |
| Quantization | 4-bit (NF4) |
| LoRA rank | 16 |
| LoRA alpha | 32 |
| LoRA dropout | 0.05 |
| Learning rate | 3e-5 |
| Epochs | 3 |
| Batch size | 1 (effective 8 with gradient accumulation) |
| Precision | FP16 |
| Hardware | NVIDIA T4 (Google Colab) |

### Training Results

The model trained for 3 epochs with a steady decrease in loss:

| Step | Training Loss |
|---|---|
| 10 | 22.38 |
| 30 | 19.23 |
| 50 | 17.03 |
| 70 | 15.23 |
| 90 | 15.08 |
| 110 | 15.07 |
| 130 | 14.57 |
| 150 | 13.90 |
| 170 | 14.35 |
| 180 | 13.71 |

## Limitations

- **Not for clinical use.** This model is an educational experiment and should never be used for actual patient care or medical decision-making.
- **Small training set.** 511 cases is a modest dataset for fine-tuning. The model may not generalize well to diseases or presentations not represented in the training data.
- **Small base model.** Llama 3.2 3B is a relatively small model. Larger models would likely produce better reasoning.
- **Biases.** The training data comes from a single institution (MIMIC-IV / Beth Israel Deaconess Medical Center), so the model may reflect that institution's patient population and clinical practices.
- **Hallucination risk.** Like all language models, this model can generate plausible-sounding but incorrect medical reasoning.

## Citation

If you use this model, please cite the DiReCT dataset:

```bibtex
@article{wang2024direct,
  title={DiReCT: Diagnostic Reasoning for Clinical Notes via Large Language Models},
  author={Wang, Bowen and Chang, Jiuyang and Qian, Yiming and others},
  journal={arXiv preprint arXiv:2408.01933},
  year={2024}
}
```

```bibtex
@article{PhysioNet-mimic-iv-ext-direct-1.0.0,
  author = {Wang, Bowen and Chang, Jiuyang and Qian, Yiming},
  title = {{MIMIC-IV-Ext-DiReCT}},
  journal = {{PhysioNet}},
  year = {2025},
  doi = {10.13026/yf96-kc87}
}
```

## Contact
This model was created as a learning exercise in fine-tuning language models for medical education applications.
Created by Arman Yalcin
www.linkedin.com/in/arman8514581