Spaces:

binary1ne
/

vllm-llama2

Paused

binary1ne commited on Aug 14, 2025

Commit

9dcedc3

verified ·

1 Parent(s): 3ac3871

Update Dockerfile

Files changed (1) hide show

Dockerfile CHANGED Viewed

@@ -12,10 +12,10 @@ WORKDIR /app
 # You might need to adjust this depending on how you're providing the model
 COPY ./model /app/model
-# Set the environment variable for Hugging Face token if you're using gated models
-# Replace <YOUR_HUGGINGFACE_TOKEN> with your actual token
-ENV HUGGING_FACE_HUB_TOKEN="<YOUR_HUGGINGFACE_TOKEN>"
 # Command to run the vLLM OpenAI-compatible server with your model
 # Replace "your-model-name" with the actual model ID from Hugging Face
-CMD ["python", "-m", "vllm.entrypoints.openai.api_server", "--model", "your-model-name", "--host", "0.0.0.0", "--port", "8000"]

 # You might need to adjust this depending on how you're providing the model
 COPY ./model /app/model
+# Set the environment variable for Hugging Face token (not strictly needed as it's not a gated model, but good practice)
+# You can uncomment this and set it if you prefer.
+# ENV HUGGING_FACE_HUB_TOKEN="<YOUR_HUGGINGFACE_TOKEN>"
 # Command to run the vLLM OpenAI-compatible server with your model
 # Replace "your-model-name" with the actual model ID from Hugging Face
+CMD ["python", "-m", "vllm.entrypoints.openai.api_server", "--model", "unsloth/Llama-3.2-3B-bnb-4bit", "--host", "0.0.0.0", "--port", "7860"]