Upload serve.py with huggingface_hub
Browse files
serve.py
CHANGED
|
@@ -20,7 +20,7 @@ Environment variables:
|
|
| 20 |
NO_PREFIX_CACHING — set to 1 to disable prefix caching
|
| 21 |
VLLM_ENFORCE_EAGER — set to 1 to disable CUDA graphs (default 0)
|
| 22 |
REASONING_PARSER — set to "qwen3" to enable <think>/</think> parsing
|
| 23 |
-
(splits
|
| 24 |
|
| 25 |
Example:
|
| 26 |
VLLM_MODEL=./model_dir python serve.py
|
|
|
|
| 20 |
NO_PREFIX_CACHING — set to 1 to disable prefix caching
|
| 21 |
VLLM_ENFORCE_EAGER — set to 1 to disable CUDA graphs (default 0)
|
| 22 |
REASONING_PARSER — set to "qwen3" to enable <think>/</think> parsing
|
| 23 |
+
(splits `reasoning` from `content` in API responses)
|
| 24 |
|
| 25 |
Example:
|
| 26 |
VLLM_MODEL=./model_dir python serve.py
|