| --- |
| license: llama3.2 |
| base_model: meta-llama/Llama-3.2-3B-Instruct |
| tags: |
| - medical |
| - clinical-reasoning |
| - diagnostic |
| - education |
| - fine-tuned |
| - lora |
| - sft |
| - trl |
| datasets: |
| - mimic-iv-ext-direct |
| language: |
| - en |
| pipeline_tag: text-generation |
| --- |
| |
| # Clinical Reasoning Model (Test 1) |
|
|
| A fine-tuned version of [Llama 3.2 3B Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) trained to produce step-by-step diagnostic reasoning chains from clinical patient cases. |
|
|
| ## Purpose |
|
|
| This model was created for **educational purposes only**. It is designed to demonstrate how a language model can walk through the clinical reasoning process, connecting patient findings (history, physical exam, labs, imaging) to a final diagnosis in a structured, step-by-step format. |
|
|
| **This model is NOT intended for clinical use, patient care, or medical decision-making.** |
|
|
| ## What It Does |
|
|
| Given a patient case (chief complaint, history, exam findings, labs, and imaging), the model produces: |
|
|
| 1. A final diagnosis |
| 2. A numbered reasoning chain that explains how each piece of clinical evidence supports or leads to that diagnosis |
|
|
| ### Example |
|
|
| **Input:** |
| > A patient presents with productive cough, fatigue, and chest congestion. History of prior TB treatment. Chest CT shows a thin-walled cavity in the right lower lobe with adjacent calcified granulomas and bronchiectasis. |
|
|
| **Output:** |
| > FINAL DIAGNOSIS: Tuberculosis |
| > |
| > Step 1: Cavities in the lungs are common in active tuberculosis, especially when the walls of the cavities are thin, indicating the possibility of active disease or reactivation of infection. |
| > Supporting evidence: Superior segment right lower lobe relatively thin-walled cavity |
| > |
| > Step 2: The patient had been treated for tuberculosis several years earlier, which is important background information because tuberculosis can recur. |
| > Supporting evidence: TB treated years ago |
| > |
| > Step 3: In patients with a history of tuberculosis, these symptoms may indicate activity or recurrence of tuberculosis. |
| > Supporting evidence: symptoms of URI including fatigue, productive cough, runny nose, and chest congestion |
|
|
| ## Training Details |
|
|
| ### Dataset |
|
|
| Trained on the [DiReCT (Diagnostic Reasoning for Clinical Notes)](https://physionet.org/content/mimic-iv-ext-direct/1.0.0/) dataset, which contains 511 clinical notes sourced from MIMIC-IV. Each note was annotated by physicians with structured diagnostic reasoning trees mapping clinical observations to final diagnoses. |
|
|
| The dataset covers 25 disease categories and 73 unique diagnoses, including: |
|
|
| - Acute Coronary Syndrome (NSTEMI, Unstable Angina) |
| - Heart Failure (HFrEF, HFpEF) |
| - Stroke (Hemorrhagic, Ischemic) |
| - Pulmonary Embolism |
| - Pneumonia |
| - COPD |
| - Multiple Sclerosis |
| - Tuberculosis |
| - Hypertension |
| - And many more |
|
|
| ### Training Configuration |
|
|
| | Parameter | Value | |
| |---|---| |
| | Base model | meta-llama/Llama-3.2-3B-Instruct | |
| | Method | SFT with LoRA (PEFT) | |
| | Quantization | 4-bit (NF4) | |
| | LoRA rank | 16 | |
| | LoRA alpha | 32 | |
| | LoRA dropout | 0.05 | |
| | Learning rate | 3e-5 | |
| | Epochs | 3 | |
| | Batch size | 1 (effective 8 with gradient accumulation) | |
| | Precision | FP16 | |
| | Hardware | NVIDIA T4 (Google Colab) | |
|
|
| ### Training Results |
|
|
| The model trained for 3 epochs with a steady decrease in loss: |
|
|
| | Step | Training Loss | |
| |---|---| |
| | 10 | 22.38 | |
| | 30 | 19.23 | |
| | 50 | 17.03 | |
| | 70 | 15.23 | |
| | 90 | 15.08 | |
| | 110 | 15.07 | |
| | 130 | 14.57 | |
| | 150 | 13.90 | |
| | 170 | 14.35 | |
| | 180 | 13.71 | |
|
|
| ## Limitations |
|
|
| - **Not for clinical use.** This model is an educational experiment and should never be used for actual patient care or medical decision-making. |
| - **Small training set.** 511 cases is a modest dataset for fine-tuning. The model may not generalize well to diseases or presentations not represented in the training data. |
| - **Small base model.** Llama 3.2 3B is a relatively small model. Larger models would likely produce better reasoning. |
| - **Biases.** The training data comes from a single institution (MIMIC-IV / Beth Israel Deaconess Medical Center), so the model may reflect that institution's patient population and clinical practices. |
| - **Hallucination risk.** Like all language models, this model can generate plausible-sounding but incorrect medical reasoning. |
|
|
| ## Citation |
|
|
| If you use this model, please cite the DiReCT dataset: |
|
|
| ```bibtex |
| @article{wang2024direct, |
| title={DiReCT: Diagnostic Reasoning for Clinical Notes via Large Language Models}, |
| author={Wang, Bowen and Chang, Jiuyang and Qian, Yiming and others}, |
| journal={arXiv preprint arXiv:2408.01933}, |
| year={2024} |
| } |
| ``` |
|
|
| ```bibtex |
| @article{PhysioNet-mimic-iv-ext-direct-1.0.0, |
| author = {Wang, Bowen and Chang, Jiuyang and Qian, Yiming}, |
| title = {{MIMIC-IV-Ext-DiReCT}}, |
| journal = {{PhysioNet}}, |
| year = {2025}, |
| doi = {10.13026/yf96-kc87} |
| } |
| ``` |
|
|
| ## Contact |
| This model was created as a learning exercise in fine-tuning language models for medical education applications. |
| Created by Arman Yalcin |
| www.linkedin.com/in/arman8514581 |