Text Generation
Transformers
PyTorch
Safetensors
PEFT
llama
medical
deepseek
unsloth
clinical-reasoning
trl
sft
conversational
text-generation-inference
Instructions to use develops20/DeepSeek-R1-Distill-Llama-8B-Medical-COT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use develops20/DeepSeek-R1-Distill-Llama-8B-Medical-COT with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="develops20/DeepSeek-R1-Distill-Llama-8B-Medical-COT") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("develops20/DeepSeek-R1-Distill-Llama-8B-Medical-COT") model = AutoModelForCausalLM.from_pretrained("develops20/DeepSeek-R1-Distill-Llama-8B-Medical-COT") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - PEFT
How to use develops20/DeepSeek-R1-Distill-Llama-8B-Medical-COT with PEFT:
Task type is invalid.
- Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use develops20/DeepSeek-R1-Distill-Llama-8B-Medical-COT with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "develops20/DeepSeek-R1-Distill-Llama-8B-Medical-COT" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "develops20/DeepSeek-R1-Distill-Llama-8B-Medical-COT", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/develops20/DeepSeek-R1-Distill-Llama-8B-Medical-COT
- SGLang
How to use develops20/DeepSeek-R1-Distill-Llama-8B-Medical-COT with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "develops20/DeepSeek-R1-Distill-Llama-8B-Medical-COT" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "develops20/DeepSeek-R1-Distill-Llama-8B-Medical-COT", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "develops20/DeepSeek-R1-Distill-Llama-8B-Medical-COT" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "develops20/DeepSeek-R1-Distill-Llama-8B-Medical-COT", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Unsloth Studio new
How to use develops20/DeepSeek-R1-Distill-Llama-8B-Medical-COT with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for develops20/DeepSeek-R1-Distill-Llama-8B-Medical-COT to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for develops20/DeepSeek-R1-Distill-Llama-8B-Medical-COT to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for develops20/DeepSeek-R1-Distill-Llama-8B-Medical-COT to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="develops20/DeepSeek-R1-Distill-Llama-8B-Medical-COT", max_seq_length=2048, ) - Docker Model Runner
How to use develops20/DeepSeek-R1-Distill-Llama-8B-Medical-COT with Docker Model Runner:
docker model run hf.co/develops20/DeepSeek-R1-Distill-Llama-8B-Medical-COT
# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("develops20/DeepSeek-R1-Distill-Llama-8B-Medical-COT")
model = AutoModelForCausalLM.from_pretrained("develops20/DeepSeek-R1-Distill-Llama-8B-Medical-COT")
messages = [
{"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))Quick Links
DeepSeek-R1-Distill-Llama-8B-Medical-COT
🏥 Fine-tuned Medical Model
This is a fine-tuned version of DeepSeek-R1-Distill-Llama-8B, optimized for medical reasoning and clinical case analysis using LoRA (Low-Rank Adaptation) with Unsloth.
- Base Model: DeepSeek-R1-Distill-Llama-8B
- Fine-Tuning Framework: Unsloth
- Dataset: FreedomIntelligence/medical-o1-reasoning-SFT
- Quantization: 4-bit (bitsandbytes)
- Task: Clinical reasoning, medical question-answering, diagnosis assistance
- Pipeline Tag:
text-generation - Metrics:
loss,accuracy - Library Name:
transformers
📖 Model Details
| Feature | Value |
|---|---|
| Architecture | Llama-8B (Distilled) |
| Language | English |
| Training Steps | 60 |
| Batch Size | 2 (with gradient accumulation) |
| Gradient Accumulation Steps | 4 |
| Precision | Mixed (FP16/BF16 based on GPU support) |
| Optimizer | AdamW 8-bit |
| Fine-Tuned With | PEFT + LoRA (Unsloth) |
📊 Training Summary
Loss Trend During Fine-Tuning:
| Step | Training Loss |
|---|---|
| 10 | 1.9188 |
| 20 | 1.4615 |
| 30 | 1.4023 |
| 40 | 1.3088 |
| 50 | 1.3443 |
| 60 | 1.3140 |
🚀 How to Use
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_name = "develops20/DeepSeek-R1-Distill-Llama-8B-Medical-COT"
# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Run inference
def ask_model(question):
inputs = tokenizer(question, return_tensors="pt").to("cuda")
outputs = model.generate(input_ids=inputs.input_ids, max_new_tokens=512)
return tokenizer.decode(outputs[0], skip_special_tokens=True)
question = "A 61-year-old woman has involuntary urine loss when coughing. What would cystometry likely reveal?"
print(ask_model(question))
Example Outputs
Q: "A 59-year-old man presents with fever, night sweats, and a 12mm aortic valve vegetation. What is the most likely predisposing factor?"
Model's Answer: "The most likely predisposing factor for this patient’s infective endocarditis is a history of valvular heart disease or prosthetic valves, given the presence of an aortic valve vegetation. The causative organism is likely Enterococcus species, which does not grow in high salt concentrations."
- Downloads last month
- 828
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="develops20/DeepSeek-R1-Distill-Llama-8B-Medical-COT") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)