File size: 4,989 Bytes
ab8f0b2
38aa79f
ab8f0b2
 
38aa79f
 
 
 
 
 
ab8f0b2
38aa79f
 
 
 
 
 
ab8f0b2
 
38aa79f
ab8f0b2
38aa79f
ab8f0b2
38aa79f
ab8f0b2
38aa79f
ab8f0b2
38aa79f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ab8f0b2
38aa79f
ab8f0b2
38aa79f
ab8f0b2
38aa79f
 
 
 
 
 
 
 
 
 
ab8f0b2
38aa79f
ab8f0b2
38aa79f
 
 
 
 
 
 
 
 
 
 
 
 
ab8f0b2
38aa79f
ab8f0b2
38aa79f
ab8f0b2
38aa79f
 
 
 
 
 
 
 
 
 
 
 
ab8f0b2
38aa79f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ab8f0b2
 
38aa79f
 
 
 
 
 
ab8f0b2
38aa79f
 
 
6cab3fb
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
---
license: llama3.2
base_model: meta-llama/Llama-3.2-3B-Instruct
tags:
- medical
- clinical-reasoning
- diagnostic
- education
- fine-tuned
- lora
- sft
- trl
datasets:
- mimic-iv-ext-direct
language:
- en
pipeline_tag: text-generation
---

# Clinical Reasoning Model (Test 1)

A fine-tuned version of [Llama 3.2 3B Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) trained to produce step-by-step diagnostic reasoning chains from clinical patient cases.

## Purpose

This model was created for **educational purposes only**. It is designed to demonstrate how a language model can walk through the clinical reasoning process, connecting patient findings (history, physical exam, labs, imaging) to a final diagnosis in a structured, step-by-step format.

**This model is NOT intended for clinical use, patient care, or medical decision-making.**

## What It Does

Given a patient case (chief complaint, history, exam findings, labs, and imaging), the model produces:

1. A final diagnosis
2. A numbered reasoning chain that explains how each piece of clinical evidence supports or leads to that diagnosis

### Example

**Input:**
> A patient presents with productive cough, fatigue, and chest congestion. History of prior TB treatment. Chest CT shows a thin-walled cavity in the right lower lobe with adjacent calcified granulomas and bronchiectasis.

**Output:**
> FINAL DIAGNOSIS: Tuberculosis
>
> Step 1: Cavities in the lungs are common in active tuberculosis, especially when the walls of the cavities are thin, indicating the possibility of active disease or reactivation of infection.
> Supporting evidence: Superior segment right lower lobe relatively thin-walled cavity
>
> Step 2: The patient had been treated for tuberculosis several years earlier, which is important background information because tuberculosis can recur.
> Supporting evidence: TB treated years ago
>
> Step 3: In patients with a history of tuberculosis, these symptoms may indicate activity or recurrence of tuberculosis.
> Supporting evidence: symptoms of URI including fatigue, productive cough, runny nose, and chest congestion

## Training Details

### Dataset

Trained on the [DiReCT (Diagnostic Reasoning for Clinical Notes)](https://physionet.org/content/mimic-iv-ext-direct/1.0.0/) dataset, which contains 511 clinical notes sourced from MIMIC-IV. Each note was annotated by physicians with structured diagnostic reasoning trees mapping clinical observations to final diagnoses.

The dataset covers 25 disease categories and 73 unique diagnoses, including:

- Acute Coronary Syndrome (NSTEMI, Unstable Angina)
- Heart Failure (HFrEF, HFpEF)
- Stroke (Hemorrhagic, Ischemic)
- Pulmonary Embolism
- Pneumonia
- COPD
- Multiple Sclerosis
- Tuberculosis
- Hypertension
- And many more

### Training Configuration

| Parameter | Value |
|---|---|
| Base model | meta-llama/Llama-3.2-3B-Instruct |
| Method | SFT with LoRA (PEFT) |
| Quantization | 4-bit (NF4) |
| LoRA rank | 16 |
| LoRA alpha | 32 |
| LoRA dropout | 0.05 |
| Learning rate | 3e-5 |
| Epochs | 3 |
| Batch size | 1 (effective 8 with gradient accumulation) |
| Precision | FP16 |
| Hardware | NVIDIA T4 (Google Colab) |

### Training Results

The model trained for 3 epochs with a steady decrease in loss:

| Step | Training Loss |
|---|---|
| 10 | 22.38 |
| 30 | 19.23 |
| 50 | 17.03 |
| 70 | 15.23 |
| 90 | 15.08 |
| 110 | 15.07 |
| 130 | 14.57 |
| 150 | 13.90 |
| 170 | 14.35 |
| 180 | 13.71 |

## Limitations

- **Not for clinical use.** This model is an educational experiment and should never be used for actual patient care or medical decision-making.
- **Small training set.** 511 cases is a modest dataset for fine-tuning. The model may not generalize well to diseases or presentations not represented in the training data.
- **Small base model.** Llama 3.2 3B is a relatively small model. Larger models would likely produce better reasoning.
- **Biases.** The training data comes from a single institution (MIMIC-IV / Beth Israel Deaconess Medical Center), so the model may reflect that institution's patient population and clinical practices.
- **Hallucination risk.** Like all language models, this model can generate plausible-sounding but incorrect medical reasoning.

## Citation

If you use this model, please cite the DiReCT dataset:

```bibtex
@article{wang2024direct,
  title={DiReCT: Diagnostic Reasoning for Clinical Notes via Large Language Models},
  author={Wang, Bowen and Chang, Jiuyang and Qian, Yiming and others},
  journal={arXiv preprint arXiv:2408.01933},
  year={2024}
}
```

```bibtex
@article{PhysioNet-mimic-iv-ext-direct-1.0.0,
  author = {Wang, Bowen and Chang, Jiuyang and Qian, Yiming},
  title = {{MIMIC-IV-Ext-DiReCT}},
  journal = {{PhysioNet}},
  year = {2025},
  doi = {10.13026/yf96-kc87}
}
```

## Contact
This model was created as a learning exercise in fine-tuning language models for medical education applications.
Created by Arman Yalcin
www.linkedin.com/in/arman8514581