hongzhouyu/FineMed-SFT
Viewer • Updated • 732k • 155 • 4
How to use hongzhouyu/FineMedLM with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-generation", model="hongzhouyu/FineMedLM")
messages = [
{"role": "user", "content": "Who are you?"},
]
pipe(messages) # Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("hongzhouyu/FineMedLM")
model = AutoModelForCausalLM.from_pretrained("hongzhouyu/FineMedLM")
messages = [
{"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))How to use hongzhouyu/FineMedLM with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "hongzhouyu/FineMedLM"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "hongzhouyu/FineMedLM",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'docker model run hf.co/hongzhouyu/FineMedLM
How to use hongzhouyu/FineMedLM with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "hongzhouyu/FineMedLM" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "hongzhouyu/FineMedLM",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "hongzhouyu/FineMedLM" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "hongzhouyu/FineMedLM",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'How to use hongzhouyu/FineMedLM with Docker Model Runner:
docker model run hf.co/hongzhouyu/FineMedLM
FineMedLM is a medical chat LLM trained via SFT on meticulously crafted synthetic data. By further applying DPO, the model acquires enhanced deep reasoning capabilities, culminating in the development of FineMedLM-o1.
For more information, visit our GitHub repository.
You can use FineMedLM in the same way as Llama-3.1-8B-Instruct:
(⚠️Note: Please use the system prompt we provide to achieve better inference results)
from transformers import AutoModelForCausalLM, AutoTokenizer
main_model_name = "hongzhouyu/FineMedLM"
model = AutoModelForCausalLM.from_pretrained(main_model_name, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(main_model_name)
prompt = (
"""The following are multiple choice questions (with answers) about health. Think step by step and then finish your answer with "the answer is (X)" where X is the correct letter choice.
Question:
Polio can be eradicated by which of the following?
Options:
A. Herbal remedies
B. Use of antibiotics
C. Regular intake of vitamins
D. Administration of tetanus vaccine
E. Attention to sewage control and hygiene
F. Natural immunity acquired through exposure
G. Use of antiviral drugs
Answer: Let's think step by step.
"""
)
messages = [
{"role": "system", "content": "You are a helpful professional doctor. The user will give you a medical question, and you should answer it in a professional way."},
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
print(text)
model_inputs = tokenizer(text, return_tensors="pt").to(model.device)
print("-----start generate-----")
generated_ids = model.generate(
model_inputs.input_ids,
max_new_tokens=2048,
eos_token_id=tokenizer.eos_token_id
)
answer = tokenizer.decode(generated_ids[0], skip_special_tokens=False)
print(answer)
@misc{yu2025finemedlmo1enhancingmedicalreasoning,
title={FineMedLM-o1: Enhancing the Medical Reasoning Ability of LLM from Supervised Fine-Tuning to Test-Time Training},
author={Hongzhou Yu and Tianhao Cheng and Ying Cheng and Rui Feng},
year={2025},
eprint={2501.09213},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2501.09213},
}