Instructions to use develops20/DeepSeek-R1-Distill-Llama-8B-Medical-COT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use develops20/DeepSeek-R1-Distill-Llama-8B-Medical-COT with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="develops20/DeepSeek-R1-Distill-Llama-8B-Medical-COT")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("develops20/DeepSeek-R1-Distill-Llama-8B-Medical-COT")
model = AutoModelForCausalLM.from_pretrained("develops20/DeepSeek-R1-Distill-Llama-8B-Medical-COT")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

PEFT
How to use develops20/DeepSeek-R1-Distill-Llama-8B-Medical-COT with PEFT:
```
Task type is invalid.
```
Inference
Local Apps Settings

vLLM

How to use develops20/DeepSeek-R1-Distill-Llama-8B-Medical-COT with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "develops20/DeepSeek-R1-Distill-Llama-8B-Medical-COT"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "develops20/DeepSeek-R1-Distill-Llama-8B-Medical-COT",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/develops20/DeepSeek-R1-Distill-Llama-8B-Medical-COT

SGLang

How to use develops20/DeepSeek-R1-Distill-Llama-8B-Medical-COT with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "develops20/DeepSeek-R1-Distill-Llama-8B-Medical-COT" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "develops20/DeepSeek-R1-Distill-Llama-8B-Medical-COT",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "develops20/DeepSeek-R1-Distill-Llama-8B-Medical-COT" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "develops20/DeepSeek-R1-Distill-Llama-8B-Medical-COT",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio

How to use develops20/DeepSeek-R1-Distill-Llama-8B-Medical-COT with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for develops20/DeepSeek-R1-Distill-Llama-8B-Medical-COT to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for develops20/DeepSeek-R1-Distill-Llama-8B-Medical-COT to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for develops20/DeepSeek-R1-Distill-Llama-8B-Medical-COT to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="develops20/DeepSeek-R1-Distill-Llama-8B-Medical-COT",
    max_seq_length=2048,
)

Docker Model Runner
How to use develops20/DeepSeek-R1-Distill-Llama-8B-Medical-COT with Docker Model Runner:
```
docker model run hf.co/develops20/DeepSeek-R1-Distill-Llama-8B-Medical-COT
```

DeepSeek-R1-Distill-Llama-8B-Medical-COT

🏥 Fine-tuned Medical Model

This is a fine-tuned version of DeepSeek-R1-Distill-Llama-8B, optimized for medical reasoning and clinical case analysis using LoRA (Low-Rank Adaptation) with Unsloth.

Base Model: DeepSeek-R1-Distill-Llama-8B
Fine-Tuning Framework: Unsloth
Dataset: FreedomIntelligence/medical-o1-reasoning-SFT
Quantization: 4-bit (bitsandbytes)
Task: Clinical reasoning, medical question-answering, diagnosis assistance
Pipeline Tag: text-generation
Metrics: loss, accuracy
Library Name: transformers

📖 Model Details

Feature	Value
Architecture	Llama-8B (Distilled)
Language	English
Training Steps	60
Batch Size	2 (with gradient accumulation)
Gradient Accumulation Steps	4
Precision	Mixed (FP16/BF16 based on GPU support)
Optimizer	AdamW 8-bit
Fine-Tuned With	PEFT + LoRA (Unsloth)

📊 Training Summary

Loss Trend During Fine-Tuning:

Step	Training Loss
10	1.9188
20	1.4615
30	1.4023
40	1.3088
50	1.3443
60	1.3140

🚀 How to Use

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "develops20/DeepSeek-R1-Distill-Llama-8B-Medical-COT"

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Run inference
def ask_model(question):
    inputs = tokenizer(question, return_tensors="pt").to("cuda")
    outputs = model.generate(input_ids=inputs.input_ids, max_new_tokens=512)
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

question = "A 61-year-old woman has involuntary urine loss when coughing. What would cystometry likely reveal?"
print(ask_model(question))

Example Outputs
Q: "A 59-year-old man presents with fever, night sweats, and a 12mm aortic valve vegetation. What is the most likely predisposing factor?"
Model's Answer: "The most likely predisposing factor for this patient’s infective endocarditis is a history of valvular heart disease or prosthetic valves, given the presence of an aortic valve vegetation. The causative organism is likely Enterococcus species, which does not grow in high salt concentrations."

Downloads last month: 84