Instructions to use SURESHBEEKHANI/Deep-seek-R1-Medical-reasoning-SFT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use SURESHBEEKHANI/Deep-seek-R1-Medical-reasoning-SFT with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="SURESHBEEKHANI/Deep-seek-R1-Medical-reasoning-SFT",
	filename="unsloth.Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use SURESHBEEKHANI/Deep-seek-R1-Medical-reasoning-SFT with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf SURESHBEEKHANI/Deep-seek-R1-Medical-reasoning-SFT:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf SURESHBEEKHANI/Deep-seek-R1-Medical-reasoning-SFT:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf SURESHBEEKHANI/Deep-seek-R1-Medical-reasoning-SFT:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf SURESHBEEKHANI/Deep-seek-R1-Medical-reasoning-SFT:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf SURESHBEEKHANI/Deep-seek-R1-Medical-reasoning-SFT:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf SURESHBEEKHANI/Deep-seek-R1-Medical-reasoning-SFT:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf SURESHBEEKHANI/Deep-seek-R1-Medical-reasoning-SFT:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf SURESHBEEKHANI/Deep-seek-R1-Medical-reasoning-SFT:Q4_K_M

Use Docker

docker model run hf.co/SURESHBEEKHANI/Deep-seek-R1-Medical-reasoning-SFT:Q4_K_M

LM Studio
Jan

vLLM

How to use SURESHBEEKHANI/Deep-seek-R1-Medical-reasoning-SFT with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "SURESHBEEKHANI/Deep-seek-R1-Medical-reasoning-SFT"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "SURESHBEEKHANI/Deep-seek-R1-Medical-reasoning-SFT",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/SURESHBEEKHANI/Deep-seek-R1-Medical-reasoning-SFT:Q4_K_M

Ollama
How to use SURESHBEEKHANI/Deep-seek-R1-Medical-reasoning-SFT with Ollama:
```
ollama run hf.co/SURESHBEEKHANI/Deep-seek-R1-Medical-reasoning-SFT:Q4_K_M
```

Unsloth Studio new

How to use SURESHBEEKHANI/Deep-seek-R1-Medical-reasoning-SFT with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for SURESHBEEKHANI/Deep-seek-R1-Medical-reasoning-SFT to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for SURESHBEEKHANI/Deep-seek-R1-Medical-reasoning-SFT to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for SURESHBEEKHANI/Deep-seek-R1-Medical-reasoning-SFT to start chatting

Docker Model Runner
How to use SURESHBEEKHANI/Deep-seek-R1-Medical-reasoning-SFT with Docker Model Runner:
```
docker model run hf.co/SURESHBEEKHANI/Deep-seek-R1-Medical-reasoning-SFT:Q4_K_M
```

Lemonade

How to use SURESHBEEKHANI/Deep-seek-R1-Medical-reasoning-SFT with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull SURESHBEEKHANI/Deep-seek-R1-Medical-reasoning-SFT:Q4_K_M

Run and chat with the model

lemonade run user.Deep-seek-R1-Medical-reasoning-SFT-Q4_K_M

List all available models

lemonade list

DeepSeek-R1-Distill-Llama-8B - Fine-Tuned for Medical Chain-of-Thought Reasoning

Model Overview

The DeepSeek-R1-Distill-Llama-8B model has been fine-tuned for medical chain-of-thought (CoT) reasoning. This fine-tuning process enhances the model's ability to generate structured, concise, and accurate medical reasoning outputs. The model was trained using a 500-sample subset of the medical-o1-reasoning-SFT dataset, with optimizations including 4-bit quantization and LoRA adapters to improve efficiency and reduce memory usage.

Key Features

Base Model: unsloth/DeepSeek-R1-Distill-Llama-8B
Fine-Tuning Objective: Adaptation for structured, step-by-step medical reasoning tasks.
Training Dataset: 500 samples from medical-o1-reasoning-SFT dataset.
Tools Used:
- Unsloth: Accelerates training by 2x.
- 4-bit Quantization: Reduces model memory usage.
- LoRA Adapters: Enables parameter-efficient fine-tuning.
Training Time: 44 minutes.

Performance Improvements

Response Length: Reduced from an average of 450 words to 150 words, improving conciseness.
Reasoning Style: Shifted from verbose explanations to more focused, structured reasoning.
Answer Format: Transitioned from bulleted lists to paragraph-style answers for clarity.

Intended Use

This model is designed for use by:

Medical professionals requiring structured diagnostic reasoning.
Researchers seeking assistance in medical knowledge extraction.
Developers integrating the model for medical CoT tasks in clinical settings, treatment planning, and education.

Typical use cases include:

Clinical diagnostics
Treatment planning
Medical education and training
Research assistance

Training Details

Key Components:

Model: unsloth/DeepSeek-R1-Distill-Llama-8B
Dataset: medical-o1-reasoning-SFT (500 samples)
Training Tools:
- Unsloth: Optimized training for faster results (2x speedup).
- 4-bit Quantization: Optimized memory usage for efficient training.
- LoRA Adapters: Enables lightweight fine-tuning with reduced computational costs.

Fine-Tuning Process:

Install Required Packages: Installed necessary libraries, including unsloth and kaggle.
Authentication: Authenticated with Hugging Face Hub and Weights & Biases for tracking experiments and versioning.
Model Initialization: Initialized the base model with 4-bit quantization and a sequence length of up to 2048 tokens.
Pre-Fine-Tuning Inference: Conducted an initial inference to establish the model’s baseline performance on a medical question.
Dataset Preparation: Structured and formatted the training data using a custom template tailored to medical CoT reasoning tasks.
Application of LoRA Adapters: Incorporated LoRA adapters for efficient parameter tuning during fine-tuning.
Supervised Fine-Tuning: Utilized SFTTrainer to fine-tune the model with optimized hyperparameters for 44 minutes.
Post-Fine-Tuning Inference: Evaluated the model’s improved performance by testing it on the same medical question after fine-tuning.
Saving and Loading: Stored the fine-tuned model, including LoRA adapters, for easy future use and deployment.
Model Deployment: Pushed the fine-tuned model to Hugging Face Hub in GGUF format with 4-bit quantization enabled for efficient use.

Notebook

Access the implementation notebook for this modelhere. This notebook provides detailed steps for fine-tuning and deploying the model.

Downloads last month: 195

GGUF

Model size

8B params

Architecture

llama

Hardware compatibility

4-bit

5-bit

8-bit

Model tree for SURESHBEEKHANI/Deep-seek-R1-Medical-reasoning-SFT

Base model

deepseek-ai/DeepSeek-R1-Distill-Llama-8B

Finetuned

unsloth/DeepSeek-R1-Distill-Llama-8B

Quantized

(21)

this model

SURESHBEEKHANI
/

Deep-seek-R1-Medical-reasoning-SFT