Model Deployment & Evaluation Guide

1. Deploy Model (vLLM OpenAI-Compatible Server)

Use vLLM to deploy a HuggingFace model as an OpenAI-compatible API server.

Basic Command

python -m vllm.entrypoints.openai.api_server \
  --model <path to PRDJudge model> \
  --served-model-name PRDJudge \
  --port 8004 <you can change to other port> \
  --enable-auto-tool-choice \
  --tool-call-parser qwen3_coder \
  --tensor-parallel-size <your GPU card number>

Once deployed, the service endpoint will be:

Local access: http://localhost:8004/v1
Cross-machine access: http://<server_ip>:8004/v1

You can verify the deployment with the following command:

curl http://localhost:8004/v1/models

2. Configure the Deployed Model in Minimal Agent-Eval

Follow our ADK-based Agent config.

Edit EvalAgent/code_eval_agent/config.py and add your model configuration:

"your_model_name": LiteLlmWithSleep(
    model="openai/PRDJudge",        # Model name loaded by vLLM, must match the --model parameter
    api_base="http://<server_ip>:8004/v1",         # vLLM service URL, default to use localhost when deployed locally
    api_key="EMPTY",                               # vLLM does not require an API key by default, use "EMPTY"
    max_tokens_threshold=64000,
    enable_compression=True,
    temperature=0.1
)

Note: The model field must include the openai/ prefix — this is the LiteLLM routing format for OpenAI-compatible endpoints. <your_vllm_model_name> should match the model name from the vLLM --model parameter (you can verify via curl http://<server_ip>:8004/v1/models).

Downloads last month: 14

Safetensors

Model size

31B params

Tensor type

BF16

Model tree for AGI-Eval/PRDjudge

Base model

Qwen/Qwen3-Coder-30B-A3B-Instruct

Finetuned

(59)

this model

AGI-Eval
/

PRDjudge

Model Deployment & Evaluation Guide

1. Deploy Model (vLLM OpenAI-Compatible Server)

Basic Command

2. Configure the Deployed Model in Minimal Agent-Eval

Model tree for AGI-Eval/PRDjudge

Dataset used to train AGI-Eval/PRDjudge