Spaces:

nexusbert
/

DSN

Running

nexusbert commited on 9 days ago

Commit

3cdb77e

1 Parent(s): 10bc91f

Update Dockerfile and README.md to set default for DOCKER_BUILD_SKIP_LLM_WARM

- Changed the default value of DOCKER_BUILD_SKIP_LLM_WARM to 1 in the Dockerfile to prevent out-of-memory issues during build.
- Updated README.md to clarify the behavior of the DOCKER_BUILD_SKIP_LLM_WARM variable and its impact on model warming during the build process.

Files changed (3) hide show

Dockerfile +1 -1
README.md +1 -1
scripts/docker_build_assets.py +10 -5

Dockerfile CHANGED Viewed

@@ -35,7 +35,7 @@ ENV OMP_NUM_THREADS=2 \
 ARG HF_TOKEN=
 ARG HUGGING_FACE_HUB_TOKEN=
-ARG DOCKER_BUILD_SKIP_LLM_WARM=
 ENV HF_TOKEN=${HF_TOKEN}
 ENV HUGGING_FACE_HUB_TOKEN=${HUGGING_FACE_HUB_TOKEN}
 ENV DOCKER_BUILD_SKIP_LLM_WARM=${DOCKER_BUILD_SKIP_LLM_WARM}

 ARG HF_TOKEN=
 ARG HUGGING_FACE_HUB_TOKEN=
+ARG DOCKER_BUILD_SKIP_LLM_WARM=1
 ENV HF_TOKEN=${HF_TOKEN}
 ENV HUGGING_FACE_HUB_TOKEN=${HUGGING_FACE_HUB_TOKEN}
 ENV DOCKER_BUILD_SKIP_LLM_WARM=${DOCKER_BUILD_SKIP_LLM_WARM}

README.md CHANGED Viewed

@@ -103,7 +103,7 @@ docker compose up --build -d
 Default compose maps **`7860:7860`**. The image bakes **`/code/data/business_catalog_embedded.jsonl`** and **`/code/data/task_a_reviews_embedded.jsonl`** at build time (or stubs if Yelp JSON is missing). Override with a bind mount, e.g. `./data:/code/data`, if you rebuild those files locally.
-The Docker image sets **`HF_HUB_OFFLINE=1`** and **`TRANSFORMERS_OFFLINE=1`** so the running container does not call the Hugging Face Hub (models must be fully cached during `docker build`). `scripts/docker_build_assets.py` runs **`warm_runtime_models()`** after data JSONL: one SentenceTransformer forward and one causal LM forward on CPU (set build-arg **`DOCKER_BUILD_SKIP_LLM_WARM=1`** if the builder OOMs).
 On startup, **`STARTUP_PREWARM`** (default **`user_modeling`**) loads that task’s embedder + optional RAG index + LLM before serving traffic (`all` = Task A and Task B, uses ~2× LLM RAM). Disable with **`SKIP_STARTUP_PREWARM=1`**.

 Default compose maps **`7860:7860`**. The image bakes **`/code/data/business_catalog_embedded.jsonl`** and **`/code/data/task_a_reviews_embedded.jsonl`** at build time (or stubs if Yelp JSON is missing). Override with a bind mount, e.g. `./data:/code/data`, if you rebuild those files locally.
+The Docker image sets **`HF_HUB_OFFLINE=1`** and **`TRANSFORMERS_OFFLINE=1`** so the running container does not call the Hugging Face Hub (models must be fully present in `/models/huggingface` from **`snapshot_download` during build**). By default **`DOCKER_BUILD_SKIP_LLM_WARM=1`**: the image build does **not** load the causal LM into RAM (avoids **OOM / exit 137** on small HF builders). Weights are still downloaded to disk; **startup prewarm** loads them when the Space starts. Set build-arg **`DOCKER_BUILD_SKIP_LLM_WARM=0`** only on a machine with several GB of spare RAM if you want a full CPU warm during build.
 On startup, **`STARTUP_PREWARM`** (default **`user_modeling`**) loads that task’s embedder + optional RAG index + LLM before serving traffic (`all` = Task A and Task B, uses ~2× LLM RAM). Disable with **`SKIP_STARTUP_PREWARM=1`**.

scripts/docker_build_assets.py CHANGED Viewed

@@ -77,7 +77,16 @@ def prefetch_hub_files_only() -> None:
 def warm_runtime_models() -> None:
-    print("docker_build_assets: warming models for runtime (CPU, one forward each)...")
     import gc
     emb_key = os.environ.get("TASK_B_LOCAL_EMBEDDING_MODEL", "all-MiniLM-L6-v2")
@@ -88,10 +97,6 @@ def warm_runtime_models() -> None:
     del st
     gc.collect()
-    if os.environ.get("DOCKER_BUILD_SKIP_LLM_WARM", "").strip().lower() in ("1", "true", "yes"):
-        print("docker_build_assets: DOCKER_BUILD_SKIP_LLM_WARM set — skipping causal LM warm.")
-        return
     import torch  # type: ignore[import-untyped]
     from transformers import AutoModelForCausalLM, AutoTokenizer  # type: ignore[import-untyped]

 def warm_runtime_models() -> None:
+    raw = os.environ.get("DOCKER_BUILD_SKIP_LLM_WARM", "1").strip().lower()
+    skip = raw not in ("0", "false", "no")
+    if skip:
+        print(
+            "docker_build_assets: skipping in-RAM LLM warm (DOCKER_BUILD_SKIP_LLM_WARM default 1). "
+            "Weights are on disk from snapshot_download + stub encodes; uvicorn prewarm loads them at runtime."
+        )
+        return
+    print("docker_build_assets: full model warm (CPU) — DOCKER_BUILD_SKIP_LLM_WARM=0; needs several GB RAM.")
     import gc
     emb_key = os.environ.get("TASK_B_LOCAL_EMBEDDING_MODEL", "all-MiniLM-L6-v2")
     del st
     gc.collect()
     import torch  # type: ignore[import-untyped]
     from transformers import AutoModelForCausalLM, AutoTokenizer  # type: ignore[import-untyped]