--- license: llama3.2 base_model: meta-llama/Llama-3.2-3B-Instruct tags: - medical - clinical-reasoning - diagnostic - education - fine-tuned - lora - sft - trl datasets: - mimic-iv-ext-direct language: - en pipeline_tag: text-generation --- # Clinical Reasoning Model (Test 1) A fine-tuned version of [Llama 3.2 3B Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) trained to produce step-by-step diagnostic reasoning chains from clinical patient cases. ## Purpose This model was created for **educational purposes only**. It is designed to demonstrate how a language model can walk through the clinical reasoning process, connecting patient findings (history, physical exam, labs, imaging) to a final diagnosis in a structured, step-by-step format. **This model is NOT intended for clinical use, patient care, or medical decision-making.** ## What It Does Given a patient case (chief complaint, history, exam findings, labs, and imaging), the model produces: 1. A final diagnosis 2. A numbered reasoning chain that explains how each piece of clinical evidence supports or leads to that diagnosis ### Example **Input:** > A patient presents with productive cough, fatigue, and chest congestion. History of prior TB treatment. Chest CT shows a thin-walled cavity in the right lower lobe with adjacent calcified granulomas and bronchiectasis. **Output:** > FINAL DIAGNOSIS: Tuberculosis > > Step 1: Cavities in the lungs are common in active tuberculosis, especially when the walls of the cavities are thin, indicating the possibility of active disease or reactivation of infection. > Supporting evidence: Superior segment right lower lobe relatively thin-walled cavity > > Step 2: The patient had been treated for tuberculosis several years earlier, which is important background information because tuberculosis can recur. > Supporting evidence: TB treated years ago > > Step 3: In patients with a history of tuberculosis, these symptoms may indicate activity or recurrence of tuberculosis. > Supporting evidence: symptoms of URI including fatigue, productive cough, runny nose, and chest congestion ## Training Details ### Dataset Trained on the [DiReCT (Diagnostic Reasoning for Clinical Notes)](https://physionet.org/content/mimic-iv-ext-direct/1.0.0/) dataset, which contains 511 clinical notes sourced from MIMIC-IV. Each note was annotated by physicians with structured diagnostic reasoning trees mapping clinical observations to final diagnoses. The dataset covers 25 disease categories and 73 unique diagnoses, including: - Acute Coronary Syndrome (NSTEMI, Unstable Angina) - Heart Failure (HFrEF, HFpEF) - Stroke (Hemorrhagic, Ischemic) - Pulmonary Embolism - Pneumonia - COPD - Multiple Sclerosis - Tuberculosis - Hypertension - And many more ### Training Configuration | Parameter | Value | |---|---| | Base model | meta-llama/Llama-3.2-3B-Instruct | | Method | SFT with LoRA (PEFT) | | Quantization | 4-bit (NF4) | | LoRA rank | 16 | | LoRA alpha | 32 | | LoRA dropout | 0.05 | | Learning rate | 3e-5 | | Epochs | 3 | | Batch size | 1 (effective 8 with gradient accumulation) | | Precision | FP16 | | Hardware | NVIDIA T4 (Google Colab) | ### Training Results The model trained for 3 epochs with a steady decrease in loss: | Step | Training Loss | |---|---| | 10 | 22.38 | | 30 | 19.23 | | 50 | 17.03 | | 70 | 15.23 | | 90 | 15.08 | | 110 | 15.07 | | 130 | 14.57 | | 150 | 13.90 | | 170 | 14.35 | | 180 | 13.71 | ## Limitations - **Not for clinical use.** This model is an educational experiment and should never be used for actual patient care or medical decision-making. - **Small training set.** 511 cases is a modest dataset for fine-tuning. The model may not generalize well to diseases or presentations not represented in the training data. - **Small base model.** Llama 3.2 3B is a relatively small model. Larger models would likely produce better reasoning. - **Biases.** The training data comes from a single institution (MIMIC-IV / Beth Israel Deaconess Medical Center), so the model may reflect that institution's patient population and clinical practices. - **Hallucination risk.** Like all language models, this model can generate plausible-sounding but incorrect medical reasoning. ## Citation If you use this model, please cite the DiReCT dataset: ```bibtex @article{wang2024direct, title={DiReCT: Diagnostic Reasoning for Clinical Notes via Large Language Models}, author={Wang, Bowen and Chang, Jiuyang and Qian, Yiming and others}, journal={arXiv preprint arXiv:2408.01933}, year={2024} } ``` ```bibtex @article{PhysioNet-mimic-iv-ext-direct-1.0.0, author = {Wang, Bowen and Chang, Jiuyang and Qian, Yiming}, title = {{MIMIC-IV-Ext-DiReCT}}, journal = {{PhysioNet}}, year = {2025}, doi = {10.13026/yf96-kc87} } ``` ## Contact This model was created as a learning exercise in fine-tuning language models for medical education applications. Created by Arman Yalcin www.linkedin.com/in/arman8514581