Instructions to use PKU-ONELab/CE-RM-4B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use PKU-ONELab/CE-RM-4B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="PKU-ONELab/CE-RM-4B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("PKU-ONELab/CE-RM-4B")
model = AutoModelForCausalLM.from_pretrained("PKU-ONELab/CE-RM-4B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use PKU-ONELab/CE-RM-4B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "PKU-ONELab/CE-RM-4B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "PKU-ONELab/CE-RM-4B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/PKU-ONELab/CE-RM-4B

SGLang

How to use PKU-ONELab/CE-RM-4B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "PKU-ONELab/CE-RM-4B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "PKU-ONELab/CE-RM-4B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "PKU-ONELab/CE-RM-4B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "PKU-ONELab/CE-RM-4B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use PKU-ONELab/CE-RM-4B with Docker Model Runner:
```
docker model run hf.co/PKU-ONELab/CE-RM-4B
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

CE-RM

Paper: CE-RM: A Pointwise Generative Reward Model Optimized via Two-Stage Rollout and Unified Criteria

Introduction

Automatic evaluation is crucial yet challenging for open-ended natural language generation, especially when rule-based metrics are infeasible. Compared with traditional methods, the recent LLM-as-a-Judge paradigms enable better and more flexible evaluation, and show promise as generative reward models for reinforcement learning. However, prior work has revealed a notable gap between their seemingly impressive benchmark performance and actual effectiveness in RL practice. We attribute this issue to some limitations in existing studies, including the dominance of pairwise evaluation and inadequate optimization of evaluation criteria. Therefore, we propose CE-RM-4B, a pointwise generative reward model trained with a dedicated two-stage rollout method, and adopting unified query-based criteria. Using only about 5.7K high-quality data curated from the open-source preference dataset, our CE-RM-4B achieves superior performance on diverse reward model benchmarks, especially in Best-of-N scenarios, and delivers more effective improvements in downstream RL practice.

Usage

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "PKU-ONELab/CE-RM-4B"
model = AutoModelForCausalLM.from_pretrained(
    model_name, 
    torch_dtype="auto", 
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

criteria_prompt = """Your task is to produce a minimal set of criteria for evaluating the quality of potential responses to the user query given below.

Begin by carefully analyzing the query to fully understand the user's intent and requirements, and then take into account all common and tangible factors that can indicate the response quality.

From these considerations, derive the final evaluation criteria list, which **must adhere to the following requirements:**

- Each criterion should consist of a concise term as well as its unambiguous description.
- The number of criteria is not necessarily the more the better; Fewer yet comprehensive is more desired.
- The criteria should be sufficient and complete, ensuring that no essential aspects or key signals of response quality are omitted.
- The criteria should be necessary and non-overlapping, with each one indispensable, distinct in perspective, and strictly orthogonal to others.

Provide the relevant analysis first, followed by the numbered list of criteria between [Start of Criteria] and [End of Criteria], with one criterion per line and the more important ones coming first.

Below is the user query:

[Start of Query]
{query}
[End of Query]
"""

evaluation_prompt = """Now that you have a response to the previous user query, your new task is to evaluate it using the criteria list you have produced.

For each criterion, focus on its concerns and carefully evaluate the corresponding specific quality of the response, providing the detailed analysis as well as relevant arguments, followed by the corresponding quality score from 0 to 5 within $\\boxed{}$.

Moreover, if the response demonstrates strengths or weaknesses beyond the scope of your criteria list, introduce an additional criterion titled \"Other Point(s),\" discussing them and considering them as bonus points or deductions as appropriate.

Finally, based on the analyses of these criteria, including their relative importance and scores, **conduct a comprehensive evaluation of the response's overall quality with sufficient and explicit evidence**, and then provide a corresponding overall quality score from 0 to 10 within $\\boxed{}$.

Use integers or half-point increments for all scores, with higher numbers representing higher quality.

Below is the response:

[Start of Response]
{response}
[End of Response]
"""

criteria_conversation = [
    {"role":"user", "content": criteria_prompt.replace("{query}", query)}
]
input_ids = tokenizer.apply_chat_template(
    criteria_conversation,
    tokenize=True,
    add_generation_prompt=True,
    enable_thinking=False,
    return_tensors="pt").to(model.device)
output = model.generate(
    input_ids=input_ids,
    max_new_tokens=4096,
    temperature=0,
)
criteria = tokenizer.decode(output[0][len(input_ids[0]):], skip_special_tokens=True)
print(criteria)

evaluation_conversation = [
    {"role":"user", "content": criteria_prompt.replace("{query}", query)},
    {"role":"assistant", "content": criteria},
    {"role":"user", "content": evaluation_prompt.replace("{response}", response)}
]
input_ids = tokenizer.apply_chat_template(
    evaluation_conversation,
    tokenize=True,
    add_generation_prompt=True,
    enable_thinking=False,
    return_tensors="pt").to(model.device)
output = model.generate(
    input_ids=input_ids,
    max_new_tokens=8192,
    temperature=0,
)
evaluation = tokenizer.decode(output[0][len(input_ids[0]):], skip_special_tokens=True)
print(evaluation)

Citation

@article{hu2026rm,
  title={CE-RM: A Pointwise Generative Reward Model Optimized via Two-Stage Rollout and Unified Criteria},
  author={Hu, Xinyu and He, Yancheng and Wang, Weixun and Feng, Tao and Lin, Li and Liu, Jiashun and Su, Wenbo and Zheng, Bo and Wan, Xiaojun},
  journal={arXiv preprint arXiv:2601.20327},
  year={2026}
}