williyam commited on
Commit
feac028
·
1 Parent(s): 0ffbde7

fix(docker): remove hardcoded LLM vars, pre-warm embeddings, add .dockerignore

Browse files

- Remove hardcoded API_BASE_URL and MODEL_NAME (use .env or HF Space secrets)
- Add embedding model pre-warm step to avoid cold-start timeouts
- Replace 'COPY . .' with explicit COPY of remaining files to avoid duplication
- Increase HEALTHCHECK start-period from 60s to 120s
- Create .dockerignore to exclude .venv, .git, data, etc.

Files changed (2) hide show
  1. .dockerignore +18 -0
  2. Dockerfile +5 -4
.dockerignore ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ .venv/
2
+ __pycache__/
3
+ *.pyc
4
+ .git/
5
+ .github/
6
+ .env
7
+ .env.*
8
+ *.egg-info/
9
+ dist/
10
+ build/
11
+ data/
12
+ *.pkl
13
+ *.faiss
14
+ .mypy_cache/
15
+ .pytest_cache/
16
+ .ruff_cache/
17
+ node_modules/
18
+ baseline_results.json
Dockerfile CHANGED
@@ -16,20 +16,21 @@ COPY domains/ domains/
16
  RUN pip install --no-cache-dir --upgrade pip && \
17
  pip install --no-cache-dir .
18
 
19
- COPY . .
 
 
 
20
 
21
  RUN mkdir -p /app/data/faiss_indices
22
 
23
  ENV SERVER_HOST=0.0.0.0
24
  ENV SERVER_PORT=7860
25
- ENV API_BASE_URL=https://router.huggingface.co/hf-inference/models/meta-llama/Llama-3.2-3B-Instruct/v1
26
- ENV MODEL_NAME=meta-llama/Llama-3.2-3B-Instruct
27
  ENV PYTHONUNBUFFERED=1
28
  ENV PYTHONDONTWRITEBYTECODE=1
29
 
30
  EXPOSE 7860
31
 
32
- HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
33
  CMD curl -f http://localhost:7860/health || exit 1
34
 
35
  CMD ["python", "main.py"]
 
16
  RUN pip install --no-cache-dir --upgrade pip && \
17
  pip install --no-cache-dir .
18
 
19
+ # Pre-warm embedding model so first /reset doesn't time out
20
+ RUN python -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')"
21
+
22
+ COPY inference.py main.py openenv.yaml ./
23
 
24
  RUN mkdir -p /app/data/faiss_indices
25
 
26
  ENV SERVER_HOST=0.0.0.0
27
  ENV SERVER_PORT=7860
 
 
28
  ENV PYTHONUNBUFFERED=1
29
  ENV PYTHONDONTWRITEBYTECODE=1
30
 
31
  EXPOSE 7860
32
 
33
+ HEALTHCHECK --interval=30s --timeout=10s --start-period=120s --retries=3 \
34
  CMD curl -f http://localhost:7860/health || exit 1
35
 
36
  CMD ["python", "main.py"]