Spaces:

MSGEncrypted
/

lesson-agent-dev

Sleeping

App Files Files Community

MSGEncrypted commited on 20 days ago

Commit

bf15bc3

1 Parent(s): b0f9e4b

usage wip

Browse files

Files changed (2) hide show

README.md +2 -0
USAGE.md +203 -0

README.md CHANGED Viewed

@@ -13,6 +13,8 @@ license: apache-2.0
 Gradio chat Space for the [Build Small Hackathon](https://huggingface.co/build-small-hackathon). Runs local inference with **llama.cpp** (GGUF) by default; optional **transformers** backend via env.
 ## Prerequisites
 - [uv](https://docs.astral.sh/uv/)

 Gradio chat Space for the [Build Small Hackathon](https://huggingface.co/build-small-hackathon). Runs local inference with **llama.cpp** (GGUF) by default; optional **transformers** backend via env.
+See **[USAGE.md](USAGE.md)** for local run, Docker smoke test, and HF Space deployment steps.
 ## Prerequisites
 - [uv](https://docs.astral.sh/uv/)

USAGE.md ADDED Viewed

	@@ -0,0 +1,203 @@

+# Usage
+How to run the Gradio chat app locally, test it in Docker, and deploy to a Hugging Face Space for the [Build Small Hackathon](https://huggingface.co/build-small-hackathon).
+## Prerequisites
+- [uv](https://docs.astral.sh/uv/) installed
+- Python 3.12 (see `.python-version`)
+- For Docker testing: Docker installed locally
+- For HF Space deploy: Hugging Face account with access to the `build-small-hackathon` org
+## Local development
+### 1. Install dependencies
+```bash
+uv sync --all-packages
+```
+### 2. Configure environment (optional)
+```bash
+cp .env.example .env
+```
+Edit `.env` if you want a different model or local GGUF path. Defaults work out of the box.
+### 3. Pre-download the model (recommended)
+The app can download the GGUF on first chat, but pre-downloading avoids a long wait during your first message:
+```bash
+uv run python scripts/download_model.py
+```
+Then add the printed path to `.env`:
+```bash
+MODEL_PATH=./models/qwen2.5-3b-instruct-q4_k_m.gguf
+```
+### 4. Run the Gradio app
+```bash
+uv run --package gradio-space python -m gradio_space.app
+```
+Open http://localhost:7860.
+The model loads on the **first chat message** unless you set `MODEL_PATH`. After code changes, restart the process to pick up updates.
+### 5. Quick sanity checks
+```bash
+# Inference package resolves
+uv run python -c "from inference.factory import get_backend; print(type(get_backend()).__name__)"
+# Gradio app module loads
+uv run --package gradio-space python -c "from gradio_space.app import build_demo; print(build_demo())"
+```
+### Local env reference
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `INFERENCE_BACKEND` | `llama_cpp` | `llama_cpp` or `transformers` |
+| `MODEL_REPO` | `Qwen/Qwen2.5-3B-Instruct-GGUF` | Hub repo for GGUF |
+| `MODEL_FILE` | `qwen2.5-3b-instruct-q4_k_m.gguf` | GGUF filename |
+| `MODEL_PATH` | — | Local GGUF path (skips Hub download) |
+| `N_CTX` | `4096` | Context window |
+| `N_GPU_LAYERS` | `0` | GPU layers for llama.cpp (`0` = CPU only) |
+| `PORT` | `7860` | Gradio listen port |
+| `MODEL_ID` | `Qwen/Qwen2.5-3B-Instruct` | Used when `INFERENCE_BACKEND=transformers` |
+### Optional: transformers backend
+Heavier install; only needed if you switch away from llama.cpp:
+```bash
+uv sync --package inference --extra transformers
+INFERENCE_BACKEND=transformers MODEL_ID=Qwen/Qwen2.5-3B-Instruct \
+  uv run --package gradio-space python -m gradio_space.app
+```
+---
+## Docker (local prod-like test)
+Run the same container image HF Spaces will build:
+```bash
+docker build -t hackathon-space .
+docker run --rm -p 7860:7860 \
+  -e MODEL_REPO=Qwen/Qwen2.5-3B-Instruct-GGUF \
+  -e MODEL_FILE=qwen2.5-3b-instruct-q4_k_m.gguf \
+  -e N_CTX=4096 \
+  -e N_GPU_LAYERS=0 \
+  hackathon-space
+```
+Open http://localhost:7860. Stop with `Ctrl+C`.
+To use a pre-downloaded local model inside Docker, mount it and set `MODEL_PATH`:
+```bash
+docker run --rm -p 7860:7860 \
+  -v "$(pwd)/models:/app/models:ro" \
+  -e MODEL_PATH=/app/models/qwen2.5-3b-instruct-q4_k_m.gguf \
+  hackathon-space
+```
+---
+## Hugging Face Space deployment
+This repo uses the **Docker SDK**. The Space card metadata lives in the YAML frontmatter at the top of [README.md](README.md).
+### 1. Push code to GitHub
+Make sure `main` (or your deploy branch) contains at minimum:
+- `Dockerfile`
+- `README.md` (with `sdk: docker` and `app_port: 7860`)
+- `pyproject.toml`, `uv.lock`
+- `apps/gradio-space/` and `libs/inference/`
+### 2. Create the Space
+1. Go to [build-small-hackathon](https://huggingface.co/build-small-hackathon)
+2. **New Space**
+3. Name: e.g. `small-model-hackathon`
+4. SDK: **Docker**
+5. Link your GitHub repo, or push directly to the Space repo
+CLI alternative (if you have `hf` installed and org access):
+```bash
+hf repo create build-small-hackathon/<your-space-name> \
+  --repo-type space \
+  --space_sdk docker
+```
+### 3. Configure hardware
+| Setting | Recommendation |
+|---------|----------------|
+| Hardware | **CPU basic** to start (llama.cpp with `N_GPU_LAYERS=0`) |
+| Upgrade | GPU Space if you set `N_GPU_LAYERS > 0` for faster inference |
+### 4. Set Space environment variables
+In the Space **Settings → Variables and secrets**:
+| Variable | Value |
+|----------|-------|
+| `INFERENCE_BACKEND` | `llama_cpp` |
+| `MODEL_REPO` | `Qwen/Qwen2.5-3B-Instruct-GGUF` |
+| `MODEL_FILE` | `qwen2.5-3b-instruct-q4_k_m.gguf` |
+| `N_CTX` | `4096` |
+| `N_GPU_LAYERS` | `0` (or higher on GPU hardware) |
+### 5. Build and verify
+HF builds from the root `Dockerfile` and runs:
+```bash
+uv run --package gradio-space python -m gradio_space.app
+```
+Check the **Logs** tab while the Space builds. Once running, open the Space URL and send a test chat message. The first message may take several minutes on CPU while the GGUF downloads.
+### 6. Optional: persistent model cache
+If cold starts are too slow, attach a **Storage Bucket** in Space settings so downloaded GGUF files survive restarts.
+---
+## Troubleshooting
+| Symptom | Likely cause | Fix |
+|---------|--------------|-----|
+| First chat hangs / slow | GGUF downloading from Hub | Pre-download locally; on Space, wait or use Storage Bucket |
+| `Failed to load model` in chat | Wrong `MODEL_REPO` / `MODEL_FILE` | Check env vars match a valid GGUF on Hub |
+| Docker build fails on `llama-cpp-python` | Missing build tools | Dockerfile already installs `build-essential` and `cmake` |
+| Space build fails | Missing `uv.lock` or README YAML | Ensure `sdk: docker` is in root `README.md` frontmatter |
+| `transformers` backend error | Optional deps not installed | Run `uv sync --package inference --extra transformers` |
+| Port already in use locally | Another process on 7860 | `PORT=7861 uv run --package gradio-space python -m gradio_space.app` |
+---
+## Entrypoint summary
+All three environments use the same command:
+```bash
+uv run --package gradio-space python -m gradio_space.app
+```
+| Environment | How to run |
+|-------------|------------|
+| Local dev | `uv run --package gradio-space python -m gradio_space.app` |
+| Docker | `docker run -p 7860:7860 hackathon-space` |
+| HF Space | Built and started automatically from `Dockerfile` `CMD` |