MSGEncrypted commited on
Commit
bf15bc3
·
1 Parent(s): b0f9e4b

usage wip

Browse files
Files changed (2) hide show
  1. README.md +2 -0
  2. USAGE.md +203 -0
README.md CHANGED
@@ -13,6 +13,8 @@ license: apache-2.0
13
 
14
  Gradio chat Space for the [Build Small Hackathon](https://huggingface.co/build-small-hackathon). Runs local inference with **llama.cpp** (GGUF) by default; optional **transformers** backend via env.
15
 
 
 
16
  ## Prerequisites
17
 
18
  - [uv](https://docs.astral.sh/uv/)
 
13
 
14
  Gradio chat Space for the [Build Small Hackathon](https://huggingface.co/build-small-hackathon). Runs local inference with **llama.cpp** (GGUF) by default; optional **transformers** backend via env.
15
 
16
+ See **[USAGE.md](USAGE.md)** for local run, Docker smoke test, and HF Space deployment steps.
17
+
18
  ## Prerequisites
19
 
20
  - [uv](https://docs.astral.sh/uv/)
USAGE.md ADDED
@@ -0,0 +1,203 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Usage
2
+
3
+ How to run the Gradio chat app locally, test it in Docker, and deploy to a Hugging Face Space for the [Build Small Hackathon](https://huggingface.co/build-small-hackathon).
4
+
5
+ ## Prerequisites
6
+
7
+ - [uv](https://docs.astral.sh/uv/) installed
8
+ - Python 3.12 (see `.python-version`)
9
+ - For Docker testing: Docker installed locally
10
+ - For HF Space deploy: Hugging Face account with access to the `build-small-hackathon` org
11
+
12
+ ## Local development
13
+
14
+ ### 1. Install dependencies
15
+
16
+ ```bash
17
+ uv sync --all-packages
18
+ ```
19
+
20
+ ### 2. Configure environment (optional)
21
+
22
+ ```bash
23
+ cp .env.example .env
24
+ ```
25
+
26
+ Edit `.env` if you want a different model or local GGUF path. Defaults work out of the box.
27
+
28
+ ### 3. Pre-download the model (recommended)
29
+
30
+ The app can download the GGUF on first chat, but pre-downloading avoids a long wait during your first message:
31
+
32
+ ```bash
33
+ uv run python scripts/download_model.py
34
+ ```
35
+
36
+ Then add the printed path to `.env`:
37
+
38
+ ```bash
39
+ MODEL_PATH=./models/qwen2.5-3b-instruct-q4_k_m.gguf
40
+ ```
41
+
42
+ ### 4. Run the Gradio app
43
+
44
+ ```bash
45
+ uv run --package gradio-space python -m gradio_space.app
46
+ ```
47
+
48
+ Open http://localhost:7860.
49
+
50
+ The model loads on the **first chat message** unless you set `MODEL_PATH`. After code changes, restart the process to pick up updates.
51
+
52
+ ### 5. Quick sanity checks
53
+
54
+ ```bash
55
+ # Inference package resolves
56
+ uv run python -c "from inference.factory import get_backend; print(type(get_backend()).__name__)"
57
+
58
+ # Gradio app module loads
59
+ uv run --package gradio-space python -c "from gradio_space.app import build_demo; print(build_demo())"
60
+ ```
61
+
62
+ ### Local env reference
63
+
64
+ | Variable | Default | Description |
65
+ |----------|---------|-------------|
66
+ | `INFERENCE_BACKEND` | `llama_cpp` | `llama_cpp` or `transformers` |
67
+ | `MODEL_REPO` | `Qwen/Qwen2.5-3B-Instruct-GGUF` | Hub repo for GGUF |
68
+ | `MODEL_FILE` | `qwen2.5-3b-instruct-q4_k_m.gguf` | GGUF filename |
69
+ | `MODEL_PATH` | — | Local GGUF path (skips Hub download) |
70
+ | `N_CTX` | `4096` | Context window |
71
+ | `N_GPU_LAYERS` | `0` | GPU layers for llama.cpp (`0` = CPU only) |
72
+ | `PORT` | `7860` | Gradio listen port |
73
+ | `MODEL_ID` | `Qwen/Qwen2.5-3B-Instruct` | Used when `INFERENCE_BACKEND=transformers` |
74
+
75
+ ### Optional: transformers backend
76
+
77
+ Heavier install; only needed if you switch away from llama.cpp:
78
+
79
+ ```bash
80
+ uv sync --package inference --extra transformers
81
+ INFERENCE_BACKEND=transformers MODEL_ID=Qwen/Qwen2.5-3B-Instruct \
82
+ uv run --package gradio-space python -m gradio_space.app
83
+ ```
84
+
85
+ ---
86
+
87
+ ## Docker (local prod-like test)
88
+
89
+ Run the same container image HF Spaces will build:
90
+
91
+ ```bash
92
+ docker build -t hackathon-space .
93
+ docker run --rm -p 7860:7860 \
94
+ -e MODEL_REPO=Qwen/Qwen2.5-3B-Instruct-GGUF \
95
+ -e MODEL_FILE=qwen2.5-3b-instruct-q4_k_m.gguf \
96
+ -e N_CTX=4096 \
97
+ -e N_GPU_LAYERS=0 \
98
+ hackathon-space
99
+ ```
100
+
101
+ Open http://localhost:7860. Stop with `Ctrl+C`.
102
+
103
+ To use a pre-downloaded local model inside Docker, mount it and set `MODEL_PATH`:
104
+
105
+ ```bash
106
+ docker run --rm -p 7860:7860 \
107
+ -v "$(pwd)/models:/app/models:ro" \
108
+ -e MODEL_PATH=/app/models/qwen2.5-3b-instruct-q4_k_m.gguf \
109
+ hackathon-space
110
+ ```
111
+
112
+ ---
113
+
114
+ ## Hugging Face Space deployment
115
+
116
+ This repo uses the **Docker SDK**. The Space card metadata lives in the YAML frontmatter at the top of [README.md](README.md).
117
+
118
+ ### 1. Push code to GitHub
119
+
120
+ Make sure `main` (or your deploy branch) contains at minimum:
121
+
122
+ - `Dockerfile`
123
+ - `README.md` (with `sdk: docker` and `app_port: 7860`)
124
+ - `pyproject.toml`, `uv.lock`
125
+ - `apps/gradio-space/` and `libs/inference/`
126
+
127
+ ### 2. Create the Space
128
+
129
+ 1. Go to [build-small-hackathon](https://huggingface.co/build-small-hackathon)
130
+ 2. **New Space**
131
+ 3. Name: e.g. `small-model-hackathon`
132
+ 4. SDK: **Docker**
133
+ 5. Link your GitHub repo, or push directly to the Space repo
134
+
135
+ CLI alternative (if you have `hf` installed and org access):
136
+
137
+ ```bash
138
+ hf repo create build-small-hackathon/<your-space-name> \
139
+ --repo-type space \
140
+ --space_sdk docker
141
+ ```
142
+
143
+ ### 3. Configure hardware
144
+
145
+ | Setting | Recommendation |
146
+ |---------|----------------|
147
+ | Hardware | **CPU basic** to start (llama.cpp with `N_GPU_LAYERS=0`) |
148
+ | Upgrade | GPU Space if you set `N_GPU_LAYERS > 0` for faster inference |
149
+
150
+ ### 4. Set Space environment variables
151
+
152
+ In the Space **Settings → Variables and secrets**:
153
+
154
+ | Variable | Value |
155
+ |----------|-------|
156
+ | `INFERENCE_BACKEND` | `llama_cpp` |
157
+ | `MODEL_REPO` | `Qwen/Qwen2.5-3B-Instruct-GGUF` |
158
+ | `MODEL_FILE` | `qwen2.5-3b-instruct-q4_k_m.gguf` |
159
+ | `N_CTX` | `4096` |
160
+ | `N_GPU_LAYERS` | `0` (or higher on GPU hardware) |
161
+
162
+ ### 5. Build and verify
163
+
164
+ HF builds from the root `Dockerfile` and runs:
165
+
166
+ ```bash
167
+ uv run --package gradio-space python -m gradio_space.app
168
+ ```
169
+
170
+ Check the **Logs** tab while the Space builds. Once running, open the Space URL and send a test chat message. The first message may take several minutes on CPU while the GGUF downloads.
171
+
172
+ ### 6. Optional: persistent model cache
173
+
174
+ If cold starts are too slow, attach a **Storage Bucket** in Space settings so downloaded GGUF files survive restarts.
175
+
176
+ ---
177
+
178
+ ## Troubleshooting
179
+
180
+ | Symptom | Likely cause | Fix |
181
+ |---------|--------------|-----|
182
+ | First chat hangs / slow | GGUF downloading from Hub | Pre-download locally; on Space, wait or use Storage Bucket |
183
+ | `Failed to load model` in chat | Wrong `MODEL_REPO` / `MODEL_FILE` | Check env vars match a valid GGUF on Hub |
184
+ | Docker build fails on `llama-cpp-python` | Missing build tools | Dockerfile already installs `build-essential` and `cmake` |
185
+ | Space build fails | Missing `uv.lock` or README YAML | Ensure `sdk: docker` is in root `README.md` frontmatter |
186
+ | `transformers` backend error | Optional deps not installed | Run `uv sync --package inference --extra transformers` |
187
+ | Port already in use locally | Another process on 7860 | `PORT=7861 uv run --package gradio-space python -m gradio_space.app` |
188
+
189
+ ---
190
+
191
+ ## Entrypoint summary
192
+
193
+ All three environments use the same command:
194
+
195
+ ```bash
196
+ uv run --package gradio-space python -m gradio_space.app
197
+ ```
198
+
199
+ | Environment | How to run |
200
+ |-------------|------------|
201
+ | Local dev | `uv run --package gradio-space python -m gradio_space.app` |
202
+ | Docker | `docker run -p 7860:7860 hackathon-space` |
203
+ | HF Space | Built and started automatically from `Dockerfile` `CMD` |