Spaces:

nexusbert
/

DSN

Running

App Files Files Community

DSN / README.md

nexusbert

Enhance agent workflow and integration of Gemini API for text generation

652302c about 1 hour ago

preview code

raw

history blame contribute delete

6.69 kB

	---
	title: DSN
	emoji: 🏢
	colorFrom: indigo
	colorTo: red
	sdk: docker
	pinned: false
	license: mit
	short_description: DSN HACKATHON
	---

	Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

	This Space is configured as `sdk: docker`. The image builds from `Dockerfile` (CPU-only PyTorch so CUDA wheels don’t OOM the builder). During `docker build`, models are `snapshot_download`’d into `/models/huggingface` without loading the full LLM into RAM; `SentenceTransformer` embeds a stub or Yelp-derived catalog plus `data/task_a_reviews_embedded.jsonl` (review RAG for Task A). See `scripts/docker_build_assets.py`.

	Task A: persona + product → rating/review via Gemini API (default) or optional local Qwen, plus retrieved Yelp review snippets from the baked JSONL. Task B: local sentence-transformer retrieval over businesses plus Gemini (or local) reranking.

	Secrets (Hugging Face Space): `GEMINI_API_KEY` (or `GOOGLE_API_KEY`) — required for generation when `GENERATION_BACKEND=gemini`. Optional `HF_TOKEN` for Docker build only (embedder download). Never commit keys in the repo.

	---

	## DSN × BCT LLM Agent Challenge — API package

	Deadline: 24 May 2026 end of day (organiser time). Submit solution paper + repo + container link via the official form.

	Step-by-step agent narrative (for judges and your paper): [`AGENT_WORKFLOW.md`](AGENT_WORKFLOW.md).

	### Deliverables checklist

	- [ ] Working URL or Docker image for this API (judges use POST endpoints below).
	- [ ] GitHub (or equivalent) with this repo; do not commit `.env` or Yelp raw JSON.
	- [ ] Solution paper PDF (4–8 pages): point to `AGENT_WORKFLOW.md` for architecture; add experiments (e.g. RAG on/off, Nigerian prompt on/off), limits, Nigerian English design note.
	- [ ] Disclosures in paper: base HF models, Yelp-derived data / RAG index, embedding catalog build.

	### Endpoints

	\| Method \| Path \|
	\|--------\|------\|
	\| GET \| `/health`, `/` \|
	\| POST \| `/user-modeling` (aliases: `/task-1`, `/task_a`) \|
	\| POST \| `/recommendation` (aliases: `/task-2`, `/task_b`) \|

	### Request bodies

	Task 1: `{"persona": "<multiline user snapshot; optional line user_id: ...>", "product": "<business facts>", "include_raw": false}` — response includes `rag_snippets_used`.

	Task 2: `{"persona": "...", "city": null, "state": null, "chat_history": [], "top_k_retrieval": 40, "top_n_final": 10}`

	### Local run (clone this repo)

	From the repository root (this folder):

	```bash
	cp env.example .env
	pip install -r requirements.txt
	```

	Task A review index (Yelp `review.json` + `business.json`):

	```bash
	python scripts/build_task_a_review_rag.py \
	--review-json path/to/yelp_academic_dataset_review.json \
	--business-json path/to/yelp_academic_dataset_business.json \
	--output data/task_a_reviews_embedded.jsonl \
	--max-rows 12000
	```

	Use the same `TASK_B_LOCAL_EMBEDDING_MODEL` (or `TASK_A_EMBEDDING_MODEL`) at build and runtime. Omit the file only for quick tests (generation runs without RAG).

	Generation: set `GEMINI_API_KEY` in `.env` (see `env.example`). With `GENERATION_BACKEND=gemini` (default), Task A and Task B use `GEMINI_MODEL` (default `gemini-2.0-flash`). Set `GENERATION_BACKEND=local` to use on-device Qwen instead.

	Task B reranking uses Gemini when configured; embeddings stay local (`LOCAL_EMBEDDING_MODEL`).

	Recommendation index (needs Yelp `business.json` on your machine, e.g. `../yelp_dataset/extracted/` from a parent workspace):

	```bash
	python scripts/build_business_catalog.py --max-rows 30000 --only-open
	python scripts/embed_catalog.py --batch-size 64
	```

	Use the same `TASK_B_LOCAL_EMBEDDING_MODEL` for `embed_catalog.py` and at API runtime.

	Start API:

	```bash
	uvicorn app.main:app --host 0.0.0.0 --port 8080
	# or: PORT=8080 python -m app.main
	```

	### Docker

	Build with Hub token available during build (anonymous works for public models but hits rate limits):

	```bash
	docker build -t dcn-llm-agent-challenge \
	--build-arg HF_TOKEN="$HF_TOKEN" \
	--build-arg HUGGING_FACE_HUB_TOKEN="$HUGGING_FACE_HUB_TOKEN" .
	docker run --env-file .env -p 7860:7860 dcn-llm-agent-challenge
	```

	```bash
	export HF_TOKEN=hf_... # optional; must be visible to `docker build`, not only the container
	docker compose up --build -d
	```

	Default compose maps `7860:7860`. The image bakes `/code/data/business_catalog_embedded.jsonl` and `/code/data/task_a_reviews_embedded.jsonl` at build time (or stubs if Yelp JSON is missing). Override with a bind mount, e.g. `./data:/code/data`, if you rebuild those files locally.

	The Docker image sets `HF_HUB_OFFLINE=1` and `TRANSFORMERS_OFFLINE=1` so the running container does not call the Hugging Face Hub. During `docker build`, `snapshot_download` copies model files into `/models/huggingface` (and stub JSONL is embedded). Loading weights into RAM during build was disabled by default (`DOCKER_BUILD_SKIP_LLM_WARM=1`) because HF build VMs often OOM (exit 137) when loading Qwen; that RAM would not stay in the final image anyway.

	At container start, `STARTUP_PREWARM=all` (default) loads one shared embedding model and one shared causal LM (`app/shared_models.py`), then Task A RAG + Task B catalog — so `/task-2` does not pay a second full Qwen load. Expect ~1–2 minutes on CPU after deploy while logs show `Loading shared …`; then both endpoints stay fast. Disable with `SKIP_STARTUP_PREWARM=1` (not recommended on Spaces).

	### Smoke checks

	OpenAPI: `http://localhost:7860/docs` when using Docker (port 7860). Local `uvicorn` defaults to 8080 unless you set `PORT`.

	### Layout

	\| Path \| Role \|
	\|------\|------\|
	\| `app/main.py` \| FastAPI routes \|
	\| [`AGENT_WORKFLOW.md`](AGENT_WORKFLOW.md) \| Agent steps, reproducibility, paper hooks (Nigerian English, fallbacks) \|
	\| `app/user_modeling.py`, `app/user_modeling_prompt.py`, `app/task_a_rag.py` \| Task 1 local LLM + Yelp review RAG \|
	\| `app/recommendation_pipeline.py` \| Task 2 retrieval + rerank \|
	\| `scripts/build_business_catalog.py` \| Yelp → catalog JSONL \|
	\| `scripts/embed_catalog.py` \| Embed catalog (local sentence-transformers) \|
	\| `scripts/build_task_a_review_rag.py` \| Yelp reviews (+ businesses) → Task A embedded RAG JSONL \|
	\| `scripts/docker_build_assets.py` \| Docker build: HF prefetch + catalog + Task A RAG \|
	\| `env.example` \| Copy to `.env` \|
	\| `NOTICES.txt` \| Data / cloud disclosures \|

	Optional: container bind-mount Yelp `review.json` + `business.json` at build time so Docker bakes real Task A/B indexes instead of stubs.