Instructions to use BEncoderRT/medical_inference with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use BEncoderRT/medical_inference with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="BEncoderRT/medical_inference", filename="unsloth.Q8_0.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use BEncoderRT/medical_inference with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf BEncoderRT/medical_inference:Q8_0 # Run inference directly in the terminal: llama-cli -hf BEncoderRT/medical_inference:Q8_0
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf BEncoderRT/medical_inference:Q8_0 # Run inference directly in the terminal: llama-cli -hf BEncoderRT/medical_inference:Q8_0
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf BEncoderRT/medical_inference:Q8_0 # Run inference directly in the terminal: ./llama-cli -hf BEncoderRT/medical_inference:Q8_0
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf BEncoderRT/medical_inference:Q8_0 # Run inference directly in the terminal: ./build/bin/llama-cli -hf BEncoderRT/medical_inference:Q8_0
Use Docker
docker model run hf.co/BEncoderRT/medical_inference:Q8_0
- LM Studio
- Jan
- vLLM
How to use BEncoderRT/medical_inference with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "BEncoderRT/medical_inference" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "BEncoderRT/medical_inference", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/BEncoderRT/medical_inference:Q8_0
- Ollama
How to use BEncoderRT/medical_inference with Ollama:
ollama run hf.co/BEncoderRT/medical_inference:Q8_0
- Unsloth Studio new
How to use BEncoderRT/medical_inference with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for BEncoderRT/medical_inference to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for BEncoderRT/medical_inference to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for BEncoderRT/medical_inference to start chatting
- Docker Model Runner
How to use BEncoderRT/medical_inference with Docker Model Runner:
docker model run hf.co/BEncoderRT/medical_inference:Q8_0
- Lemonade
How to use BEncoderRT/medical_inference with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull BEncoderRT/medical_inference:Q8_0
Run and chat with the model
lemonade run user.medical_inference-Q8_0
List all available models
lemonade list
Use Docker
docker model run hf.co/BEncoderRT/medical_inference:Q8_0DeepSeek-R1 Medical Reasoning Model
This repository contains a fine-tuned medical reasoning model based on
DeepSeek-R1-Distill-Llama-8B
and trained on the medical-o1-reasoning-SFT dataset.
⚠️ The uploaded file (unsloth.Q8_0.gguf) contains quantized weights for efficient inference.
🔍 Model Overview
- Base Model: unsloth/DeepSeek-R1-Distill-Llama-8B
- Training Method: SFT (Supervised Fine-Tuning)
- Domain: Medical reasoning and clinical knowledge
- Language: English
- Quantization: Q8_0 (gguf format for efficient inference)
📚 Training Data
The model was fine-tuned on:
- Dataset:
FreedomIntelligence/medical-o1-reasoning-SFT - Language: English
- Task: Medical reasoning, clinical question-answering
🚀 Usage Example
Note: The model is stored in
.ggufformat (quantized). You can load it usingunslothlibrary.
from unsloth import FastLanguageModel
import torch
# Load the quantized GGUF model
model, tokenizer = FastLanguageModel.from_pretrained(
"./unsloth.Q8_0.gguf",
max_seq_length=2048,
load_in_8bit=True, # optional depending on quantization
)
FastLanguageModel.for_inference(model)
def generate(model, prompt, max_new_tokens=200):
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=max_new_tokens,
do_sample=True,
temperature=0.7,
top_p=0.9,
)
return tokenizer.decode(outputs[0], skip_special_tokens=True)
# Example prompt
prompt = """### Instruction:
"一个患有急性阑尾炎的病人已经发病5天,腹痛稍有减轻但仍然发热,在体检时发现右下腹有压痛的包块,此时应如何处理?"
### Response:
"""
print(generate(model, prompt))
<think>
这个病人已经有五天的急性阑尾炎了,腹痛虽然有所缓解,但他还是发烧,这让我有点担心。再看看他的体检结果,右下腹有压痛的包块,这就让我更有理由担心。嗯,既然有包块,而且病人还是发烧,应该是急性阑尾炎的急性症状。
想到这些,我觉得应该马上处理一下。最好的办法就是立即做腹部超声检查,看看包块的具体情况。超声检查可以帮助我们了解包块的位置、大小、形态以及周围组织的状态,这些信息对治疗非常重要。
哦,对了,包块如果是液体的,可能就不需要手术了。我们可以通过超声检查来确定这一点。这样就能避免不必要的手术创伤。
但是如果包块是固体的,特别是当它对周围组织有压迫作用时,手术就变得必要了。我们需要根据包块的具体情况来决定是否手术。
另外,包块的形态和位置也会影响手术的选择。我们需要根据这些因素来制定具体的手术方案。
嗯,总结一下,首先必须做一个腹部超声检查,确定包块的类型和具体情况。只有在确认包块是固体并且对周围组织有压迫作用的时候,才需要考虑手术。
好的,这个思路看起来很合理。现在就要把这些想法转化为具体的处理步骤。
</think>
在这种情况下,建议首先进行腹部超声检查,以确定包块的类型和具体情况。根据检查结果:
1. **如果包块是液体性包块**:通常不需要手术。可以通过药物治疗缓解症状,随后观察病情缓解情况。
2. **如果包块是固体性包块并且对周围组织有压迫作用**:这时手术就变得必要了。通常需要进行包块切除术,以解除压迫,缓解症状。
请尽快安排进行腹部超声检查,并根据检查结果制定最合适的手术方案。<|end▁of▁sentence|>
- Downloads last month
- 78
8-bit
Model tree for BEncoderRT/medical_inference
Base model
deepseek-ai/DeepSeek-R1-Distill-Llama-8B
Install from pip and serve model
# Install vLLM from pip: pip install vllm# Start the vLLM server: vllm serve "BEncoderRT/medical_inference"# Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "BEncoderRT/medical_inference", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'