SURESHBEEKHANI commited on
Commit
30c94c5
·
verified ·
1 Parent(s): c2975b9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +78 -3
README.md CHANGED
@@ -1,3 +1,78 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ pipeline_tag: text-generation
6
+ ---
7
+ # DeepSeek-R1-Distill-Llama-8B - Fine-Tuned for Medical Chain-of-Thought Reasoning
8
+
9
+ ## Model Overview
10
+ The **DeepSeek-R1-Distill-Llama-8B** model has been fine-tuned for medical chain-of-thought (CoT) reasoning. This fine-tuning process enhances the model's ability to generate structured, concise, and accurate medical reasoning outputs. The model was trained using a 500-sample subset of the **medical-o1-reasoning-SFT** dataset, with optimizations including **4-bit quantization** and **LoRA adapters** to improve efficiency and reduce memory usage.
11
+
12
+ ### Key Features
13
+ - **Base Model:** [unsloth/DeepSeek-R1-Distill-Llama-8B](https://huggingface.co/unsloth/DeepSeek-R1-Distill-Llama-8B)
14
+ - **Fine-Tuning Objective:** Adaptation for structured, step-by-step medical reasoning tasks.
15
+ - **Training Dataset:** 500 samples from **medical-o1-reasoning-SFT** dataset.
16
+ - **Tools Used:**
17
+ - **Unsloth:** Accelerates training by 2x.
18
+ - **4-bit Quantization:** Reduces model memory usage.
19
+ - **LoRA Adapters:** Enables parameter-efficient fine-tuning.
20
+ - **Training Time:** 44 minutes.
21
+
22
+ ### Performance Improvements
23
+ - **Response Length:** Reduced from an average of 450 words to 150 words, improving conciseness.
24
+ - **Reasoning Style:** Shifted from verbose explanations to more focused, structured reasoning.
25
+ - **Answer Format:** Transitioned from bulleted lists to paragraph-style answers for clarity.
26
+
27
+ ## Intended Use
28
+ This model is designed for use by:
29
+ - **Medical professionals** requiring structured diagnostic reasoning.
30
+ - **Researchers** seeking assistance in medical knowledge extraction.
31
+ - **Developers** integrating the model for medical CoT tasks in clinical settings, treatment planning, and education.
32
+
33
+ Typical use cases include:
34
+ - Clinical diagnostics
35
+ - Treatment planning
36
+ - Medical education and training
37
+ - Research assistance
38
+
39
+ ## Training Details
40
+
41
+ ### Key Components:
42
+ - **Model:** [unsloth/DeepSeek-R1-Distill-Llama-8B](https://huggingface.co/unsloth/DeepSeek-R1-Distill-Llama-8B)
43
+ - **Dataset:** **medical-o1-reasoning-SFT** (500 samples)
44
+ - **Training Tools:**
45
+ - **Unsloth:** Optimized training for faster results (2x speedup).
46
+ - **4-bit Quantization:** Optimized memory usage for efficient training.
47
+ - **LoRA Adapters:** Enables lightweight fine-tuning with reduced computational costs.
48
+
49
+ ### Fine-Tuning Process:
50
+ 1. **Install Required Packages:**
51
+ Installed necessary libraries, including **unsloth** and **kaggle**.
52
+
53
+ 2. **Authentication:**
54
+ Authenticated with **Hugging Face Hub** and **Weights & Biases** for tracking experiments and versioning.
55
+
56
+ 3. **Model Initialization:**
57
+ Initialized the base model with **4-bit quantization** and a sequence length of up to 2048 tokens.
58
+
59
+ 4. **Pre-Fine-Tuning Inference:**
60
+ Conducted an initial inference to establish the model’s baseline performance on a medical question.
61
+
62
+ 5. **Dataset Preparation:**
63
+ Structured and formatted the training data using a custom template tailored to medical CoT reasoning tasks.
64
+
65
+ 6. **Application of LoRA Adapters:**
66
+ Incorporated **LoRA adapters** for efficient parameter tuning during fine-tuning.
67
+
68
+ 7. **Supervised Fine-Tuning:**
69
+ Utilized **SFTTrainer** to fine-tune the model with optimized hyperparameters for 44 minutes.
70
+
71
+ 8. **Post-Fine-Tuning Inference:**
72
+ Evaluated the model’s improved performance by testing it on the same medical question after fine-tuning.
73
+
74
+ 9. **Saving and Loading:**
75
+ Stored the fine-tuned model, including **LoRA adapters**, for easy future use and deployment.
76
+
77
+ 10. **Model Deployment:**
78
+ Pushed the fine-tuned model to **Hugging Face Hub** in **GGUF format** with 4-bit quantization enabled for efficient use.