Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,78 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: mit
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
language:
|
| 4 |
+
- en
|
| 5 |
+
pipeline_tag: text-generation
|
| 6 |
+
---
|
| 7 |
+
# DeepSeek-R1-Distill-Llama-8B - Fine-Tuned for Medical Chain-of-Thought Reasoning
|
| 8 |
+
|
| 9 |
+
## Model Overview
|
| 10 |
+
The **DeepSeek-R1-Distill-Llama-8B** model has been fine-tuned for medical chain-of-thought (CoT) reasoning. This fine-tuning process enhances the model's ability to generate structured, concise, and accurate medical reasoning outputs. The model was trained using a 500-sample subset of the **medical-o1-reasoning-SFT** dataset, with optimizations including **4-bit quantization** and **LoRA adapters** to improve efficiency and reduce memory usage.
|
| 11 |
+
|
| 12 |
+
### Key Features
|
| 13 |
+
- **Base Model:** [unsloth/DeepSeek-R1-Distill-Llama-8B](https://huggingface.co/unsloth/DeepSeek-R1-Distill-Llama-8B)
|
| 14 |
+
- **Fine-Tuning Objective:** Adaptation for structured, step-by-step medical reasoning tasks.
|
| 15 |
+
- **Training Dataset:** 500 samples from **medical-o1-reasoning-SFT** dataset.
|
| 16 |
+
- **Tools Used:**
|
| 17 |
+
- **Unsloth:** Accelerates training by 2x.
|
| 18 |
+
- **4-bit Quantization:** Reduces model memory usage.
|
| 19 |
+
- **LoRA Adapters:** Enables parameter-efficient fine-tuning.
|
| 20 |
+
- **Training Time:** 44 minutes.
|
| 21 |
+
|
| 22 |
+
### Performance Improvements
|
| 23 |
+
- **Response Length:** Reduced from an average of 450 words to 150 words, improving conciseness.
|
| 24 |
+
- **Reasoning Style:** Shifted from verbose explanations to more focused, structured reasoning.
|
| 25 |
+
- **Answer Format:** Transitioned from bulleted lists to paragraph-style answers for clarity.
|
| 26 |
+
|
| 27 |
+
## Intended Use
|
| 28 |
+
This model is designed for use by:
|
| 29 |
+
- **Medical professionals** requiring structured diagnostic reasoning.
|
| 30 |
+
- **Researchers** seeking assistance in medical knowledge extraction.
|
| 31 |
+
- **Developers** integrating the model for medical CoT tasks in clinical settings, treatment planning, and education.
|
| 32 |
+
|
| 33 |
+
Typical use cases include:
|
| 34 |
+
- Clinical diagnostics
|
| 35 |
+
- Treatment planning
|
| 36 |
+
- Medical education and training
|
| 37 |
+
- Research assistance
|
| 38 |
+
|
| 39 |
+
## Training Details
|
| 40 |
+
|
| 41 |
+
### Key Components:
|
| 42 |
+
- **Model:** [unsloth/DeepSeek-R1-Distill-Llama-8B](https://huggingface.co/unsloth/DeepSeek-R1-Distill-Llama-8B)
|
| 43 |
+
- **Dataset:** **medical-o1-reasoning-SFT** (500 samples)
|
| 44 |
+
- **Training Tools:**
|
| 45 |
+
- **Unsloth:** Optimized training for faster results (2x speedup).
|
| 46 |
+
- **4-bit Quantization:** Optimized memory usage for efficient training.
|
| 47 |
+
- **LoRA Adapters:** Enables lightweight fine-tuning with reduced computational costs.
|
| 48 |
+
|
| 49 |
+
### Fine-Tuning Process:
|
| 50 |
+
1. **Install Required Packages:**
|
| 51 |
+
Installed necessary libraries, including **unsloth** and **kaggle**.
|
| 52 |
+
|
| 53 |
+
2. **Authentication:**
|
| 54 |
+
Authenticated with **Hugging Face Hub** and **Weights & Biases** for tracking experiments and versioning.
|
| 55 |
+
|
| 56 |
+
3. **Model Initialization:**
|
| 57 |
+
Initialized the base model with **4-bit quantization** and a sequence length of up to 2048 tokens.
|
| 58 |
+
|
| 59 |
+
4. **Pre-Fine-Tuning Inference:**
|
| 60 |
+
Conducted an initial inference to establish the model’s baseline performance on a medical question.
|
| 61 |
+
|
| 62 |
+
5. **Dataset Preparation:**
|
| 63 |
+
Structured and formatted the training data using a custom template tailored to medical CoT reasoning tasks.
|
| 64 |
+
|
| 65 |
+
6. **Application of LoRA Adapters:**
|
| 66 |
+
Incorporated **LoRA adapters** for efficient parameter tuning during fine-tuning.
|
| 67 |
+
|
| 68 |
+
7. **Supervised Fine-Tuning:**
|
| 69 |
+
Utilized **SFTTrainer** to fine-tune the model with optimized hyperparameters for 44 minutes.
|
| 70 |
+
|
| 71 |
+
8. **Post-Fine-Tuning Inference:**
|
| 72 |
+
Evaluated the model’s improved performance by testing it on the same medical question after fine-tuning.
|
| 73 |
+
|
| 74 |
+
9. **Saving and Loading:**
|
| 75 |
+
Stored the fine-tuned model, including **LoRA adapters**, for easy future use and deployment.
|
| 76 |
+
|
| 77 |
+
10. **Model Deployment:**
|
| 78 |
+
Pushed the fine-tuned model to **Hugging Face Hub** in **GGUF format** with 4-bit quantization enabled for efficient use.
|