Model Deployment & Evaluation Guide
1. Deploy Model (vLLM OpenAI-Compatible Server)
Use vLLM to deploy a HuggingFace model as an OpenAI-compatible API server.
Basic Command
python -m vllm.entrypoints.openai.api_server \
--model <path to PRDJudge model> \
--served-model-name PRDJudge \
--port 8004 <you can change to other port> \
--enable-auto-tool-choice \
--tool-call-parser qwen3_coder \
--tensor-parallel-size <your GPU card number>
Once deployed, the service endpoint will be:
- Local access:
http://localhost:8004/v1 - Cross-machine access:
http://<server_ip>:8004/v1
You can verify the deployment with the following command:
curl http://localhost:8004/v1/models
2. Configure the Deployed Model in Minimal Agent-Eval
Follow our ADK-based Agent config.
Edit EvalAgent/code_eval_agent/config.py and add your model configuration:
"your_model_name": LiteLlmWithSleep(
model="openai/PRDJudge", # Model name loaded by vLLM, must match the --model parameter
api_base="http://<server_ip>:8004/v1", # vLLM service URL, default to use localhost when deployed locally
api_key="EMPTY", # vLLM does not require an API key by default, use "EMPTY"
max_tokens_threshold=64000,
enable_compression=True,
temperature=0.1
)
Note: The
modelfield must include theopenai/prefix — this is the LiteLLM routing format for OpenAI-compatible endpoints.<your_vllm_model_name>should match the model name from the vLLM--modelparameter (you can verify viacurl http://<server_ip>:8004/v1/models).
- Downloads last month
- 13
Model tree for AGI-Eval/PRDjudge
Base model
Qwen/Qwen3-Coder-30B-A3B-Instruct