Spaces:

waddie
/

cloudmini-api

Sleeping

waddie commited on 4 days ago

Commit

ca6c646

verified ·

1 Parent(s): bbe7501

Create Dockerfile

Files changed (1) hide show

Dockerfile ADDED Viewed

+FROM python:3.10-slim
+# Install system dependencies needed to compile llama.cpp
+RUN apt-get update && apt-get install -y \
+    build-essential \
+    python3-dev \
+    wget \
+    && rm -rf /var/lib/apt/lists/*
+WORKDIR /app
+# Optimize build configurations specifically for standard CPU execution
+ENV LLAMA_GGML_BACKEND=cpu
+RUN pip install --no-cache-dir "llama-cpp-python[server]"
+# Download the optimal Q4_K_M variant directly from the waddie repo
+RUN wget -O model.gguf "https://huggingface.co/waddie/mini-2.0-GGUF/resolve/main/mini-2.0-Q4_K_M.gguf"
+# Expose the default port for Hugging Face Spaces
+EXPOSE 7860
+# Run the API server with 2 context threads to play nice with the shared CPU limits
+CMD ["python3", "-m", "llama_cpp.server", \
+     "--model", "model.gguf", \
+     "--host", "0.0.0.0", \
+     "--port", "7860", \
+     "--n_threads", "2"]