MinKeonKim/PRO-STEP-Preference-Data
Viewer • Updated • 15.9k • 67
How to use MinKeonKim/PRO-STEP-Policy-7B with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-generation", model="MinKeonKim/PRO-STEP-Policy-7B")
messages = [
{"role": "user", "content": "Who are you?"},
]
pipe(messages) # Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("MinKeonKim/PRO-STEP-Policy-7B")
model = AutoModelForCausalLM.from_pretrained("MinKeonKim/PRO-STEP-Policy-7B")
messages = [
{"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))How to use MinKeonKim/PRO-STEP-Policy-7B with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "MinKeonKim/PRO-STEP-Policy-7B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "MinKeonKim/PRO-STEP-Policy-7B",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'docker model run hf.co/MinKeonKim/PRO-STEP-Policy-7B
How to use MinKeonKim/PRO-STEP-Policy-7B with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "MinKeonKim/PRO-STEP-Policy-7B" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "MinKeonKim/PRO-STEP-Policy-7B",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "MinKeonKim/PRO-STEP-Policy-7B" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "MinKeonKim/PRO-STEP-Policy-7B",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'How to use MinKeonKim/PRO-STEP-Policy-7B with Docker Model Runner:
docker model run hf.co/MinKeonKim/PRO-STEP-Policy-7B
This is the main policy model for PRO-STEP, a self-improving framework for agentic Retrieval-Augmented Generation. The policy is trained on its own MCTS trajectories scored by an open-source 8B PRM, using step-level DPO.
| Method | Train data | HotpotQA | PopQA | 2Wiki | Bamboogle | Musique | AVG |
|---|---|---|---|---|---|---|---|
| Search-R1 | ~90,000 | 37.88 / 49.56 | 40.65 / 46.78 | 34.87 / 42.50 | 33.60 / 43.55 | 12.99 / 21.23 | 32.00 / 40.72 |
| ReasonRAG | ~5,000 | 36.37 / 47.51 | 37.78 / 44.87 | 39.80 / 46.32 | 38.40 / 46.86 | 10.59 / 19.22 | 32.59 / 40.96 |
| StepSearch | ~19,000 | 38.72 / 50.67 | 39.24 / 44.97 | 40.38 / 47.12 | 33.60 / 44.16 | 13.82 / 23.06 | 33.15 / 42.00 |
| PRO-STEP (ours) ★ | 5,000 | 38.73 / 51.63 | 40.47 / 47.37 | 44.07 / 51.43 | 36.80 / 47.63 | 12.49 / 22.41 | 34.51 / 44.09 |
EM / F1 (Strict EM, token-F1). Bootstrap 95% CI: vs Search-R1 +2.51 EM [+1.01, +4.06], vs ReasonRAG +1.93 EM [+0.46, +3.36].
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("DORAEMONG/PRO-STEP-Policy-7B", torch_dtype="auto")
tokenizer = AutoTokenizer.from_pretrained("DORAEMONG/PRO-STEP-Policy-7B")
# Use with FlashRAG SearchR1Pipeline or any agentic-RAG inference loop
# System prompt: see paper Appendix A
@article{prostep2026,
title={PRO-STEP: Step-level Process Reward Optimization for Retrieval-Augmented Generation},
author={...},
year={2026}
}