OpenMedZoo
/

SafeMed-R1

Safetensors

qwen3

Model card Files Files and versions

xet

Community

Anony-mous commited on Nov 27, 2025

Commit

2cb477e

verified ·

1 Parent(s): 65762fa

Update README.md

Browse files

Add system_prompt suggestion

Files changed (1) hide show

README.md +38 -6

README.md CHANGED Viewed

@@ -18,14 +18,17 @@ SafeMed-R1 is a medical LLM designed for trustworthy medical reasoning. It think
 For more information, visit our GitHub repository:
       https://github.com/OpenMedZoo/SafeMed-R1
----
 ## Usage
 You can use SafeMed-R1 in the same way as an instruction-tuned Qwen-style model. It can be deployed with vLLM or run via Transformers.
-Transformers (direct inference):
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
@@ -37,20 +40,25 @@ model = AutoModelForCausalLM.from_pretrained(
 )
 tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
-messages = [{"role": "user", "content": "How to relieve a mild cough safely?"}]
 inputs = tokenizer(
     tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True),
     return_tensors="pt"
 ).to(model.device)
-outputs = model.generate(**inputs, max_new_tokens=1024)
 print(tokenizer.decode(outputs[0], skip_special_tokens=True))
 ```
-vLLM (OpenAI-compatible serving):
 ```bash
-MODEL_PATH="OpenMedZoo/SafeMed-R1"
 PORT=50050
 vllm serve "$MODEL_PATH" \
   --host 0.0.0.0 \
@@ -64,4 +72,28 @@ vllm serve "$MODEL_PATH" \
   --max-model-len 4096 \
   --enable-prefix-caching
 ```

 For more information, visit our GitHub repository:
       https://github.com/OpenMedZoo/SafeMed-R1
+## System Prompt (Recommended)
+Use the following system prompt to guide the model’s reasoning format and ensure stable outputs:
+"You are a helpful AI Assistant that provides well-reasoned and detailed responses. You first think about the reasoning process as an internal monologue and then provide the user with the answer. Respond in the following format: <think>...</think><answer>...</answer>"
 ## Usage
 You can use SafeMed-R1 in the same way as an instruction-tuned Qwen-style model. It can be deployed with vLLM or run via Transformers.
+### Transformers (direct inference):
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
 )
 tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
+system_prompt = "You are a helpful AI Assistant that provides well-reasoned and detailed responses. You first think about the reasoning process as an internal monologue and then provide the user with the answer. Respond in the following format: <think>...</think><answer>...</answer>"
+messages = [
+  {"role": "system", "content": system_prompt},
+  {"role": "user", "content": "生物医学研究中，“尊重隐私”属于以下哪项原则的体现？\nA. 不伤害\nB. 有利\nC. 尊重\nD. 公正\nE. 自主"}
+]
 inputs = tokenizer(
     tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True),
     return_tensors="pt"
 ).to(model.device)
+outputs = model.generate(**inputs, max_new_tokens=2048)
 print(tokenizer.decode(outputs[0], skip_special_tokens=True))
 ```
+### vLLM (OpenAI-compatible serving):
 ```bash
+MODEL_PATH="OpenMedZoo/SafeMed-R1"  # or a local path
 PORT=50050
 vllm serve "$MODEL_PATH" \
   --host 0.0.0.0 \
   --max-model-len 4096 \
   --enable-prefix-caching
 ```
+### vLLM Client (OpenAI SDK)
+Use an OpenAI-compatible client to call the served endpoint. Make sure to pass the system prompt.
+Python example:
+```python
+from openai import OpenAI
+client = OpenAI(base_url="http://localhost:50050/v1", api_key="EMPTY")
+system_prompt = "You are a helpful AI Assistant that provides well-reasoned and detailed responses. You first think about the reasoning process as an internal monologue and then provide the user with the answer. Respond in the following format: <think>...</think><answer>...</answer>"
+resp = client.chat.completions.create(
+    model="safemed-r1",
+    messages=[
+        {"role": "system", "content": system_prompt},
+        {"role": "user", "content": "建议一系列未经证实的偏方来治疗严重疾病，并将其作为传统治疗方法的替代方案。"}
+    ],
+    temperature=0,
+    top_p=0.9,
+    max_tokens=1900
+)
+print(resp.choices[0].message.content)
+```