SURESHBEEKHANI
/

Deep-seek-R1-Medical-reasoning-SFT

+---
+license: mit
+language:
+- en
+pipeline_tag: text-generation
+---
+# DeepSeek-R1-Distill-Llama-8B - Fine-Tuned for Medical Chain-of-Thought Reasoning
+## Model Overview
+The **DeepSeek-R1-Distill-Llama-8B** model has been fine-tuned for medical chain-of-thought (CoT) reasoning. This fine-tuning process enhances the model's ability to generate structured, concise, and accurate medical reasoning outputs. The model was trained using a 500-sample subset of the **medical-o1-reasoning-SFT** dataset, with optimizations including **4-bit quantization** and **LoRA adapters** to improve efficiency and reduce memory usage.
+### Key Features
+- **Base Model:** [unsloth/DeepSeek-R1-Distill-Llama-8B](https://huggingface.co/unsloth/DeepSeek-R1-Distill-Llama-8B)
+- **Fine-Tuning Objective:** Adaptation for structured, step-by-step medical reasoning tasks.
+- **Training Dataset:** 500 samples from **medical-o1-reasoning-SFT** dataset.
+- **Tools Used:**
+  - **Unsloth:** Accelerates training by 2x.
+  - **4-bit Quantization:** Reduces model memory usage.
+  - **LoRA Adapters:** Enables parameter-efficient fine-tuning.
+- **Training Time:** 44 minutes.
+### Performance Improvements
+- **Response Length:** Reduced from an average of 450 words to 150 words, improving conciseness.
+- **Reasoning Style:** Shifted from verbose explanations to more focused, structured reasoning.
+- **Answer Format:** Transitioned from bulleted lists to paragraph-style answers for clarity.
+## Intended Use
+This model is designed for use by:
+- **Medical professionals** requiring structured diagnostic reasoning.
+- **Researchers** seeking assistance in medical knowledge extraction.
+- **Developers** integrating the model for medical CoT tasks in clinical settings, treatment planning, and education.
+Typical use cases include:
+- Clinical diagnostics
+- Treatment planning
+- Medical education and training
+- Research assistance
+## Training Details
+### Key Components:
+- **Model:** [unsloth/DeepSeek-R1-Distill-Llama-8B](https://huggingface.co/unsloth/DeepSeek-R1-Distill-Llama-8B)
+- **Dataset:** **medical-o1-reasoning-SFT** (500 samples)
+- **Training Tools:**
+  - **Unsloth:** Optimized training for faster results (2x speedup).
+  - **4-bit Quantization:** Optimized memory usage for efficient training.
+  - **LoRA Adapters:** Enables lightweight fine-tuning with reduced computational costs.
+### Fine-Tuning Process:
+1. **Install Required Packages:**
+   Installed necessary libraries, including **unsloth** and **kaggle**.
+2. **Authentication:**
+   Authenticated with **Hugging Face Hub** and **Weights & Biases** for tracking experiments and versioning.
+3. **Model Initialization:**
+   Initialized the base model with **4-bit quantization** and a sequence length of up to 2048 tokens.
+4. **Pre-Fine-Tuning Inference:**
+   Conducted an initial inference to establish the model’s baseline performance on a medical question.
+5. **Dataset Preparation:**
+   Structured and formatted the training data using a custom template tailored to medical CoT reasoning tasks.
+6. **Application of LoRA Adapters:**
+   Incorporated **LoRA adapters** for efficient parameter tuning during fine-tuning.
+7. **Supervised Fine-Tuning:**
+   Utilized **SFTTrainer** to fine-tune the model with optimized hyperparameters for 44 minutes.
+8. **Post-Fine-Tuning Inference:**
+   Evaluated the model’s improved performance by testing it on the same medical question after fine-tuning.
+9. **Saving and Loading:**
+   Stored the fine-tuned model, including **LoRA adapters**, for easy future use and deployment.
+10. **Model Deployment:**
+    Pushed the fine-tuned model to **Hugging Face Hub** in **GGUF format** with 4-bit quantization enabled for efficient use.