| | --- |
| | base_model: unsloth/deepseek-r1-distill-llama-8b-unsloth-bnb-4bit |
| | library_name: peft |
| | license: mit |
| | datasets: |
| | - FreedomIntelligence/medical-o1-reasoning-SFT |
| | language: |
| | - en |
| | tags: |
| | - medical |
| | --- |
| | |
| | # Model Card for DeepSeek-R1-Medical-COT |
| |
|
| | ## Model Details |
| |
|
| | ### Model Description |
| |
|
| | DeepSeek-R1-Medical-COT is a fine-tuned version of the DeepSeek-R1 model, optimized for medical chain-of-thought (COT) reasoning. It is designed to assist in medical-related tasks such as question-answering, reasoning, and decision support. This model is particularly useful for applications requiring structured reasoning in the medical domain. |
| |
|
| | - **Developed by:** Mohamed Mahmoud |
| | - **Funded by [optional]:** Independent project |
| | - **Shared by:** Mohamed Mahmoud |
| | - **Model type:** Transformer-based Large Language Model (LLM) |
| | - **Language(s) (NLP):** English (en) |
| | - **License:** MIT |
| | - **Finetuned from model:** unsloth/deepseek-r1-distill-llama-8b-unsloth-bnb-4bit |
| |
|
| | ### Model Sources |
| |
|
| | - **Repository:** [Hugging Face Model Repo](https://huggingface.co/thesnak/DeepSeek-R1-Medical-COT) |
| | |
| | - **LinkedIn:** [Mohamed Mahmoud](https://www.linkedin.com/in/mohamed-thesnak) |
| |
|
| | ## Uses |
| |
|
| | ### Direct Use |
| |
|
| | The model can be used directly for medical reasoning tasks, including: |
| |
|
| | - Answering medical questions |
| | - Assisting in medical decision-making |
| | - Supporting clinical research and literature review |
| |
|
| | ### Downstream Use |
| |
|
| | - Fine-tuning for specialized medical applications |
| | - Integration into chatbots and virtual assistants for medical advice |
| | - Educational tools for medical students |
| |
|
| | ### Out-of-Scope Use |
| |
|
| | - This model is not a replacement for professional medical advice. |
| | - Should not be used for clinical decision-making without expert validation. |
| | - May not perform well in languages other than English. |
| |
|
| | ## Bias, Risks, and Limitations |
| |
|
| | While fine-tuned for medical reasoning, the model may still have biases due to the limitations of its training data. Users should exercise caution and validate critical outputs with medical professionals. |
| |
|
| | ### Recommendations |
| |
|
| | Users should verify outputs, particularly in sensitive medical contexts. The model is best used as an assistive tool rather than a primary decision-making system. |
| |
|
| | ## How to Get Started with the Model |
| |
|
| | ```python |
| | from transformers import AutoModelForCausalLM, AutoTokenizer |
| | import torch |
| | |
| | model_name = "thesnak/DeepSeek-R1-Medical-COT" |
| | tokenizer = AutoTokenizer.from_pretrained(model_name) |
| | model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16, device_map="auto") |
| | |
| | input_text = "What are the symptoms of pneumonia?" |
| | inputs = tokenizer(input_text, return_tensors="pt").to("cuda") |
| | outputs = model.generate(**inputs, max_new_tokens=100) |
| | print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
| | ``` |
| |
|
| | ## Training Details |
| |
|
| | ### Training Data |
| |
|
| | The model was fine-tuned using the **FreedomIntelligence/medical-o1-reasoning-SFT** dataset, which contains medical question-answer pairs designed to improve reasoning capabilities. |
| |
|
| | ### Training Procedure |
| |
|
| | #### Preprocessing |
| |
|
| | - Tokenization using LLaMA tokenizer |
| | - Text cleaning and normalization |
| |
|
| | #### Training Hyperparameters |
| |
|
| | - **Precision:** bf16 mixed precision |
| | - **Optimizer:** AdamW |
| | - **Batch size:** 16 |
| | - **Learning rate:** 2e-5 |
| | - **Epochs:** 3 |
| |
|
| | #### Speeds, Sizes, Times |
| |
|
| | - **Training time:** Approximately 12 hours on a P100 GPU (Kaggle) |
| | - **Model size:** 8B parameters (bnb 4-bit quantized) |
| |
|
| | #### Training Loss |
| |
|
| | | Step | Training Loss | |
| | | ---- | ------------- | |
| | | 10 | 1.919000 | |
| | | 20 | 1.461800 | |
| | | 30 | 1.402500 | |
| | | 40 | 1.309000 | |
| | | 50 | 1.344400 | |
| | | 60 | 1.314100 | |
| |
|
| | ## Evaluation |
| |
|
| | ### Testing Data, Factors & Metrics |
| |
|
| | #### Testing Data |
| |
|
| | - The model was evaluated on held-out samples from **FreedomIntelligence/medical-o1-reasoning-SFT**. |
| |
|
| | #### Factors |
| |
|
| | - Performance was assessed on medical reasoning tasks. |
| |
|
| | #### Metrics |
| |
|
| | - **Perplexity:** Measured for general coherence. |
| | - **Accuracy:** Evaluated based on expert-verified responses. |
| | - **BLEU Score:** Used to assess response relevance. |
| |
|
| | ### Results |
| |
|
| | - **Perplexity:** |
| | - **Accuracy:** |
| | - **BLEU Score:** |
| | |
| | ## Model Examination |
| |
|
| | Further interpretability analyses can be conducted using tools like Captum and SHAP to analyze how the model derives its medical reasoning responses. |
| |
|
| | ## Environmental Impact |
| |
|
| | - **Hardware Type:** P100 GPU (Kaggle) |
| | - **Hours used:** 2 hours |
| | - **Cloud Provider:** Kaggle |
| | - **Compute Region:** N/A |
| | - **Carbon Emitted:** Estimated at 9.5 kg CO2eq |
| | - **[Kaggle Notebook](https://www.kaggle.com/code/thesnak/fine-tune-deepseek)** |
| | ## Technical Specifications |
| |
|
| | ### Compute Infrastructure |
| |
|
| | #### Hardware |
| |
|
| | - P100 GPU (16GB VRAM) on Kaggle |
| |
|
| |
|
| | ## Citation |
| |
|
| | **BibTeX:** |
| |
|
| | ```bibtex |
| | @misc{mahmoud2025deepseekmedcot, |
| | title={DeepSeek-R1-Medical-COT}, |
| | author={Mohamed Mahmoud}, |
| | year={2025}, |
| | url={https://huggingface.co/thesnak/DeepSeek-R1-Medical-COT} |
| | } |
| | ``` |
| |
|
| | ## Model Card Authors |
| |
|
| | - Mohamed Mahmoud |
| |
|
| | ## Model Card Contact |
| |
|
| | - [LinkedIn](https://www.linkedin.com/in/mohamed-thesnak) |