Spaces:
Paused
Paused
| title: llama32-3b-instruct VLLM | |
| emoji: 🐢 | |
| colorFrom: blue | |
| colorTo: blue | |
| sdk: docker | |
| pinned: false | |
| Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference | |
| ```shell | |
| poetry export -f requirements.txt --output requirements.txt --without-hashes | |
| ``` | |
| * The `HUGGING_FACE_HUB_TOKEN` and `HF_TOKEN` must exist during runtime (use the same value, it must have read permission to the model.) | |
| ## VLLM OpenAI Compatible API Server | |
| > References: https://huggingface.co/spaces/sofianhw/ai/tree/c6527a750644a849b6705bb6fe2fcea4e54a8196 | |
| This `api_server.py` file is exact copy version from https://github.com/vllm-project/vllm/blob/v0.6.4.post1/vllm/entrypoints/openai/api_server.py | |
| Changes (use diff tool to see the exact changes of the file): | |
| * [x] change everything route in api_server.py that start (“/v1/xxx”) to (“/api/v1/xxx”). | |
| and just run the python api_server.py with arguments. https://discuss.huggingface.co/t/run-vllm-docker-on-space/70228/5?u=yusufs | |
| ## Documentation about config | |
| * https://github.com/vllm-project/vllm/blob/v0.6.4.post1/vllm/utils.py#L1207-L1221 | |
| ```shell | |
| "serve,chat,complete", | |
| "facebook/opt-12B", | |
| '--config', 'config.yaml', | |
| '-tp', '2' | |
| ``` | |
| The yaml is equivalent with argument flag params. Consider passing using flag params that defined here for better documentation: | |
| https://github.com/vllm-project/vllm/blob/v0.6.4.post1/vllm/entrypoints/openai/cli_args.py#L77-L237 | |
| Other arguments is the same as LLM class such as `--max-model-len`, `--dtype`, or `--otlp-traces-endpoint` | |
| * https://github.com/vllm-project/vllm/blob/v0.6.4/vllm/config.py#L1061-L1086 | |
| * https://github.com/vllm-project/vllm/blob/v0.6.4.post1/vllm/engine/arg_utils.py#L221-L913 | |