File size: 4,806 Bytes
65fe514
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
# Dockerfile for Clawdbot Dev Assistant on HuggingFace Spaces
#
# CHANGELOG [2025-01-30 - Josh]
# REBUILD: Updated to Gradio 5.0+ for type="messages" support
# Added translation layer for Kimi K2.5 tool calling
# Added multimodal file upload support
#
# CHANGELOG [2025-01-31 - Claude]
# FIXED: Permissions for HF Spaces runtime user (UID 1000).
# PROBLEM: HF Spaces run containers as user 1000, not root. Directories
#   created during build (as root) weren't writable at runtime, causing
#   ChromaDB to silently fail when trying to create SQLite files.
# FIX: chown all writable directories to 1000:1000, then switch to USER 1000.
#
# CHANGELOG [2025-01-31 - Claude + Gemini]
# FIXED: /.cache PermissionError for ChromaDB ONNX embedding model download.
# PROBLEM: ChromaDB's ONNXMiniLM_L6_V2 ignores XDG_CACHE_HOME and tries to
#   write to ~/.cache. In containers, HOME=/ so it writes to /.cache (root-owned).
# FIX: Set HOME=/tmp so fallback cache paths resolve to /tmp/.cache (writable).
#   Also create /tmp/.cache subdirs during build and chown to UID 1000.
#   The actual fix is in recursive_context.py (DOWNLOAD_PATH override), but
#   HOME=/tmp catches any other library that might try the same trick.
#
# FEATURES:
# - Python 3.11 for Gradio 6.5+
# - ChromaDB with ONNX MiniLM for vector search
# - Git for repo cloning
# - Correct permissions for HF Spaces (UID 1000)
# - Cache dirs pre-created and owned by runtime user

FROM python:3.11-slim

# CACHE BUSTER: Update this date to invalidate Docker cache for everything below
ENV REBUILD_DATE=2025-01-31-v3

WORKDIR /app

# Install system dependencies
RUN apt-get update && apt-get install -y \
    git \
    build-essential \
    curl \
    && rm -rf /var/lib/apt/lists/*

# Copy requirements first (for Docker layer caching)
COPY requirements.txt .

# Install Python dependencies
# FORCE CLEAN: Uninstall cached Gradio versions to avoid version conflicts
RUN pip uninstall -y gradio gradio-client 2>/dev/null; \
    pip install --no-cache-dir -r requirements.txt

# Create all directories the app needs to write to at runtime
# /workspace/e-t-systems - repo clone target
# /workspace/chroma_db - ChromaDB fallback if /data isn't available
# /data/chroma_db - ChromaDB primary (persistent if storage enabled)
# /tmp/.cache/* - embedding model downloads and HF cache
RUN mkdir -p /workspace/e-t-systems \
             /workspace/chroma_db \
             /data/chroma_db \
             /tmp/.cache/huggingface \
             /tmp/.cache/chroma

# =============================================================================
# ENVIRONMENT VARIABLES
# =============================================================================
# CHANGELOG [2025-01-31 - Claude + Gemini]
# HOME=/tmp is the critical one. Many Python libraries (including ChromaDB's
# ONNX embedding function) use ~ as fallback for cache paths. In Docker
# containers, HOME defaults to / if not explicitly set, so ~/.cache becomes
# /.cache which is root-owned. Setting HOME=/tmp ensures any library we
# didn't explicitly configure still has a writable fallback path.
#
# The HF_HOME and XDG_CACHE_HOME vars are belt-and-suspenders for HuggingFace
# Hub downloads (model weights, tokenizers, etc).
#
# NOTE: The actual ChromaDB embedding model path is overridden in Python code
# (recursive_context.py) via DOWNLOAD_PATH attribute. These env vars catch
# everything else.
# =============================================================================
ENV HF_HOME=/tmp/.cache/huggingface
ENV XDG_CACHE_HOME=/tmp/.cache
ENV CHROMA_CACHE_DIR=/tmp/.cache/chroma
ENV HOME=/tmp
ENV PYTHONUNBUFFERED=1
ENV REPO_PATH=/workspace/e-t-systems

# Copy application files
COPY recursive_context.py .
COPY app.py .
COPY entrypoint.sh .

# =============================================================================
# PERMISSIONS FOR HF SPACES (UID 1000)
# =============================================================================
# CHANGELOG [2025-01-31 - Claude + Gemini]
# HF Spaces run as UID 1000, not root. All directories the app writes to
# must be owned by 1000:1000, otherwise operations fail silently.
# This includes:
#   /app - application directory (runtime-generated files)
#   /workspace - repo clones and ephemeral ChromaDB fallback
#   /tmp/.cache - embedding model downloads, HF cache
# =============================================================================
RUN chmod +x entrypoint.sh && \
    chown -R 1000:1000 /app /workspace /tmp/.cache

# Expose Gradio port (HF Spaces standard)
EXPOSE 7860

# Switch to non-root user
# CHANGELOG [2025-01-31 - Claude]
# HF Spaces expect UID 1000 at runtime. Setting this explicitly ensures
# consistent behavior between local testing and deployed Spaces.
USER 1000

# Launch via entrypoint script
CMD ["./entrypoint.sh"]