Instructions to use codefuse-ai/SWE-CARE-RM with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use codefuse-ai/SWE-CARE-RM with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="codefuse-ai/SWE-CARE-RM")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("codefuse-ai/SWE-CARE-RM")
model = AutoModelForCausalLM.from_pretrained("codefuse-ai/SWE-CARE-RM")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use codefuse-ai/SWE-CARE-RM with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "codefuse-ai/SWE-CARE-RM"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "codefuse-ai/SWE-CARE-RM",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/codefuse-ai/SWE-CARE-RM

SGLang

How to use codefuse-ai/SWE-CARE-RM with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "codefuse-ai/SWE-CARE-RM" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "codefuse-ai/SWE-CARE-RM",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "codefuse-ai/SWE-CARE-RM" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "codefuse-ai/SWE-CARE-RM",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use codefuse-ai/SWE-CARE-RM with Docker Model Runner:
```
docker model run hf.co/codefuse-ai/SWE-CARE-RM
```

Elvis-t9 commited on Apr 9

Commit

9050771

verified ·

1 Parent(s): 75eae10

update model card

Browse files

add model intro, quick start, citation, etc.

Files changed (1) hide show

README.md +221 -1

README.md CHANGED Viewed

@@ -9,4 +9,224 @@ library_name: transformers
 tags:
 - rm
 - cr
----

 tags:
 - rm
 - cr
+---
+# SWE-CARE-RM
+This model is a custom reward model built on top of **Qwen3-8B** with:
+- a merged **LoRA** adapter
+- an additional **projector head**
+- a scalar reward output in **[0, 1]**
+The model is designed to score the quality of a review conditioned on:
+1. an issue / problem statement
+2. a code patch
+3. a candidate review
+A higher score means the model considers the review better under the given issue and patch.
+## Model Architecture
+The model consists of:
+- base model: **Qwen3-8B**
+- adaptation: **LoRA**
+- reward head: a custom **MLP projector**
+- final score: `sigmoid(projector(last_hidden_state[:, -1]))`
+This repository contains the **merged decoder weights** together with `projector.pth`.
+## Input Format
+The model expects three text fields:
+- `issue`
+- `patch`
+- `review`
+During inference, the input is formatted as:
+```latex
+<issue>{issue}</issue><patch>{patch}</patch><review>{review}<review>
+```
+The score is computed from the last token hidden state.
+## Quick Start
+```latex
+from pathlib import Path
+import json
+import torch
+import torch.nn as nn
+from transformers import AutoModelForCausalLM, AutoTokenizer
+MODEL_DIR = "codefuse-ai/SWE-CARE-RM"
+MAX_SEQ_LEN = 51200
+MIN_REVIEW_LEN = 4096
+TRUST_REMOTE_CODE = True
+with open(f"{MODEL_DIR}/data_sample.jsonl", "r") as fr:
+    for line in fr:
+        json_data = json.loads(line)
+        break
+SAMPLE = {
+    "issue": json_data['problem_statement'],
+    "patch": json_data['patch_to_review'],
+    "review": json_data['pos_review'][0]
+}
+class Projector(nn.Module):
+    def __init__(self, arch, input_size, hidden_size, use_bf16):
+        super().__init__()
+        depth = int(arch[len("mlp"): arch.index("x_relu")])
+        layers = [nn.Linear(input_size, hidden_size).bfloat16() if use_bf16 else
+nn.Linear(input_size, hidden_size)]
+        for _ in range(1, depth):
+            layers.append(nn.ReLU())
+            layers.append(nn.Linear(hidden_size, 1).bfloat16() if use_bf16 else
+nn.Linear(hidden_size, 1))
+        self.model = nn.Sequential(*layers)
+    def forward(self, x):
+        return self.model(x)
+def resolve_dtype(dtype_name):
+    if dtype_name in {"bf16", "bfloat16"}:
+        return torch.bfloat16
+    if dtype_name in {"fp16", "float16"}:
+        return torch.float16
+    return torch.float32
+def infer_proj_arch(projector_state_dict):
+    linear_weight_keys = [k for k in projector_state_dict if k.startswith("model.")
+and k.endswith(".weight")]
+    return f"mlp{len(linear_weight_keys)}x_relu"
+def process_one(issue_ids, issue_masks, patch_ids, patch_masks, review_ids,
+review_masks, max_len, min_review_len):
+    review_keep = min(min_review_len, len(review_ids))
+    remain_for_patch = max(max_len - len(issue_ids) - review_keep, 0)
+    patch_keep = min(len(patch_ids), remain_for_patch)
+    ids_all = issue_ids + patch_ids[:patch_keep] + review_ids[-review_keep:]
+    masks_all = issue_masks + patch_masks[:patch_keep] + review_masks[-review_keep:]
+    if len(ids_all) < max_len:
+        pad_len = max_len - len(ids_all)
+        ids_all = [0] * pad_len + ids_all
+        masks_all = [0] * pad_len + masks_all
+    return ids_all[:max_len], masks_all[:max_len]
+reward_config = {}
+reward_config_path = Path(MODEL_DIR) / "reward_config.json"
+if reward_config_path.exists():
+    reward_config = json.load(open(reward_config_path, "r", encoding="utf-8"))
+projector_path = Path(MODEL_DIR) / "projector.pth"
+projector_state_dict = torch.load(projector_path, map_location="cpu")
+proj_arch = reward_config.get("proj_arch") or infer_proj_arch(projector_state_dict)
+torch_dtype = resolve_dtype(reward_config.get("torch_dtype") or "bfloat16")
+attn_implementation = reward_config.get("attn_implementation")
+tokenizer = AutoTokenizer.from_pretrained(MODEL_DIR,
+trust_remote_code=TRUST_REMOTE_CODE, padding_side="left")
+model_kwargs = {"trust_remote_code": TRUST_REMOTE_CODE, "torch_dtype": torch_dtype}
+if attn_implementation:
+    model_kwargs["attn_implementation"] = attn_implementation
+decoder = AutoModelForCausalLM.from_pretrained(MODEL_DIR, **model_kwargs)
+projector = Projector(proj_arch, decoder.config.hidden_size,
+decoder.config.hidden_size, torch_dtype == torch.bfloat16)
+projector.load_state_dict(projector_state_dict)
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+decoder.to(device).eval()
+projector.to(device).eval()
+issue_inputs = tokenizer(f"<issue>{SAMPLE['issue']}</issue>", padding=False,
+truncation="longest_first")
+patch_inputs = tokenizer(f"<patch>{SAMPLE['patch']}</patch>", padding=False,
+truncation="longest_first")
+review_inputs = tokenizer(SAMPLE["review"], padding=False, truncation="longest_first")
+input_ids, attention_mask = process_one(
+    issue_inputs["input_ids"],
+    issue_inputs["attention_mask"],
+    patch_inputs["input_ids"],
+    patch_inputs["attention_mask"],
+    review_inputs["input_ids"],
+    review_inputs["attention_mask"],
+    max_len=MAX_SEQ_LEN,
+    min_review_len=MIN_REVIEW_LEN,
+)
+inputs = {
+    "input_ids": torch.tensor([input_ids], dtype=torch.long, device=device),
+    "attention_mask": torch.tensor([attention_mask], dtype=torch.long, device=device),
+}
+with torch.no_grad():
+    hidden_state = decoder(**inputs, output_hidden_states=True).hidden_states[-1]
+    reward = torch.sigmoid(projector(hidden_state).squeeze(-1)[:, -1]).item()
+print(reward)
+```
+## Output
+The model outputs a single scalar reward score in [0, 1].
+Typical interpretation:
+- higher score: better review quality
+- lower score: worse review quality
+This score is best used for:
+- ranking candidate reviews
+- pairwise comparison
+- reward modeling in downstream training or reranking
+## Intended Use
+This model is intended for:
+- code review quality scoring
+- reward modeling for review generation
+- reranking multiple candidate reviews for the same issue and patch
+## Limitations
+- The score is relative, not an absolute guarantee of correctness.
+- Long-input truncation may affect results.
+- The model should not be used as the only signal for production-critical review
+  decisions.
+## Citation
+If you use this model, please cite SWE-CARE as appropriate.
+```
+@misc{guo2025codefusecrbenchcomprehensivenessawarebenchmarkendtoend,
+      title={CodeFuse-CR-Bench: A Comprehensiveness-aware Benchmark for End-to-End Code Review Evaluation in Python Projects},
+      author={Hanyang Guo and Xunjin Zheng and Zihan Liao and Hang Yu and Peng DI and Ziyin Zhang and Hong-Ning Dai},
+      year={2025},
+      eprint={2509.14856},
+      archivePrefix={arXiv},
+      primaryClass={cs.SE},
+      url={https://arxiv.org/abs/2509.14856},
+}
+```