DSN / README.md
nexusbert's picture
Refactor model loading and startup prewarm logic
0bf3001
metadata
title: DSN
emoji: 🏢
colorFrom: indigo
colorTo: red
sdk: docker
pinned: false
license: mit
short_description: DSN HACKATHON

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

This Space is configured as sdk: docker. The image builds from Dockerfile (CPU-only PyTorch so CUDA wheels don’t OOM the builder). During docker build, models are **snapshot_download**’d into /models/huggingface without loading the full LLM into RAM; SentenceTransformer embeds a stub or Yelp-derived catalog plus data/task_a_reviews_embedded.jsonl (review RAG for Task A). See scripts/docker_build_assets.py.

Task A: persona + product → rating/review using a local causal LM (default Qwen2.5-1.5B-Instruct) and retrieved Yelp review snippets from the baked JSONL (semantic + optional user_id: match). Task B: local sentence-transformer retrieval over businesses plus local causal LM reranking.

Secrets: Optional HF_TOKEN / HUGGING_FACE_HUB_TOKEN for Hugging Face Hub during docker build (runtime secrets alone often do not reach docker build). On Spaces: add the token under Space secrets and enable build-time / Docker build args if your UI offers it; locally use docker build --build-arg HF_TOKEN=.... Never commit tokens.


DSN × BCT LLM Agent Challenge — API package

Deadline: 24 May 2026 end of day (organiser time). Submit solution paper + repo + container link via the official form.

Step-by-step agent narrative (for judges and your paper): AGENT_WORKFLOW.md.

Deliverables checklist

  • Working URL or Docker image for this API (judges use POST endpoints below).
  • GitHub (or equivalent) with this repo; do not commit .env or Yelp raw JSON.
  • Solution paper PDF (4–8 pages): point to AGENT_WORKFLOW.md for architecture; add experiments (e.g. RAG on/off, Nigerian prompt on/off), limits, Nigerian English design note.
  • Disclosures in paper: base HF models, Yelp-derived data / RAG index, embedding catalog build.

Endpoints

Method Path
GET /health, /
POST /user-modeling (aliases: /task-1, /task_a)
POST /recommendation (aliases: /task-2, /task_b)

Request bodies

Task 1: {"persona": "<multiline user snapshot; optional line user_id: ...>", "product": "<business facts>", "include_raw": false} — response includes rag_snippets_used.

Task 2: {"persona": "...", "city": null, "state": null, "chat_history": [], "top_k_retrieval": 40, "top_n_final": 10}

Local run (clone this repo)

From the repository root (this folder):

cp env.example .env
pip install -r requirements.txt

Task A review index (Yelp review.json + business.json):

python scripts/build_task_a_review_rag.py \
  --review-json path/to/yelp_academic_dataset_review.json \
  --business-json path/to/yelp_academic_dataset_business.json \
  --output data/task_a_reviews_embedded.jsonl \
  --max-rows 12000

Use the same TASK_B_LOCAL_EMBEDDING_MODEL (or TASK_A_EMBEDDING_MODEL) at build and runtime. Omit the file only for quick tests (generation runs without RAG).

Task B uses TASK_B_LOCAL_LLM_MODEL for reranking (default Qwen2.5-1.5B-Instruct; first run may download weights from Hugging Face).

Recommendation index (needs Yelp business.json on your machine, e.g. ../yelp_dataset/extracted/ from a parent workspace):

python scripts/build_business_catalog.py --max-rows 30000 --only-open
python scripts/embed_catalog.py --batch-size 64

Use the same TASK_B_LOCAL_EMBEDDING_MODEL for embed_catalog.py and at API runtime.

Start API:

uvicorn app.main:app --host 0.0.0.0 --port 8080
# or: PORT=8080 python -m app.main

Docker

Build with Hub token available during build (anonymous works for public models but hits rate limits):

docker build -t dcn-llm-agent-challenge \
  --build-arg HF_TOKEN="$HF_TOKEN" \
  --build-arg HUGGING_FACE_HUB_TOKEN="$HUGGING_FACE_HUB_TOKEN" .
docker run --env-file .env -p 7860:7860 dcn-llm-agent-challenge
export HF_TOKEN=hf_...   # optional; must be visible to `docker build`, not only the container
docker compose up --build -d

Default compose maps 7860:7860. The image bakes /code/data/business_catalog_embedded.jsonl and /code/data/task_a_reviews_embedded.jsonl at build time (or stubs if Yelp JSON is missing). Override with a bind mount, e.g. ./data:/code/data, if you rebuild those files locally.

The Docker image sets HF_HUB_OFFLINE=1 and TRANSFORMERS_OFFLINE=1 so the running container does not call the Hugging Face Hub. During docker build, snapshot_download copies model files into /models/huggingface (and stub JSONL is embedded). Loading weights into RAM during build was disabled by default (DOCKER_BUILD_SKIP_LLM_WARM=1) because HF build VMs often OOM (exit 137) when loading Qwen; that RAM would not stay in the final image anyway.

At container start, STARTUP_PREWARM=all (default) loads one shared embedding model and one shared causal LM (app/shared_models.py), then Task A RAG + Task B catalog — so /task-2 does not pay a second full Qwen load. Expect ~1–2 minutes on CPU after deploy while logs show Loading shared …; then both endpoints stay fast. Disable with SKIP_STARTUP_PREWARM=1 (not recommended on Spaces).

Smoke checks

OpenAPI: http://localhost:7860/docs when using Docker (port 7860). Local uvicorn defaults to 8080 unless you set PORT.

Layout

Path Role
app/main.py FastAPI routes
AGENT_WORKFLOW.md Agent steps, reproducibility, paper hooks (Nigerian English, fallbacks)
app/user_modeling.py, app/user_modeling_prompt.py, app/task_a_rag.py Task 1 local LLM + Yelp review RAG
app/recommendation_pipeline.py Task 2 retrieval + rerank
scripts/build_business_catalog.py Yelp → catalog JSONL
scripts/embed_catalog.py Embed catalog (local sentence-transformers)
scripts/build_task_a_review_rag.py Yelp reviews (+ businesses) → Task A embedded RAG JSONL
scripts/docker_build_assets.py Docker build: HF prefetch + catalog + Task A RAG
env.example Copy to .env
NOTICES.txt Data / cloud disclosures

Optional: container bind-mount Yelp review.json + business.json at build time so Docker bakes real Task A/B indexes instead of stubs.