Spaces:
Sleeping
Sleeping
Update README.md
Browse files
README.md
CHANGED
|
@@ -1,12 +1,62 @@
|
|
| 1 |
---
|
| 2 |
-
title:
|
| 3 |
-
emoji:
|
| 4 |
-
colorFrom:
|
| 5 |
colorTo: purple
|
| 6 |
sdk: docker
|
| 7 |
sdk_version: 6.1.0
|
| 8 |
app_file: app.py
|
| 9 |
pinned: false
|
| 10 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
|
| 12 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
|
|
|
| 1 |
---
|
| 2 |
+
title: NeoHelper
|
| 3 |
+
emoji: 🚀
|
| 4 |
+
colorFrom: blue
|
| 5 |
colorTo: purple
|
| 6 |
sdk: docker
|
| 7 |
sdk_version: 6.1.0
|
| 8 |
app_file: app.py
|
| 9 |
pinned: false
|
| 10 |
---
|
| 11 |
+
---
|
| 12 |
+
# Neo Vision Space (Gemma 3 / Qwen2.5‑VL, CPU, Docker)
|
| 13 |
+
|
| 14 |
+
This Space runs **llama.cpp built from source** inside a **Custom Docker Space** and supports:
|
| 15 |
+
|
| 16 |
+
- ✅ Gemma 3 4B Vision (`ggml-org/gemma-3-4b-it-GGUF`)
|
| 17 |
+
- ✅ Qwen2.5‑VL 7B (`unsloth/Qwen2.5-VL-7B-Instruct-GGUF`)
|
| 18 |
+
- ✅ CPU‑only, no GPUs
|
| 19 |
+
- ✅ No prebuilt wheels — everything compiled in the Dockerfile
|
| 20 |
+
|
| 21 |
+
## Model selection
|
| 22 |
+
|
| 23 |
+
Set the environment variable:
|
| 24 |
+
|
| 25 |
+
- `MODEL_KIND=gemma` → use Gemma 3 4B Vision
|
| 26 |
+
- `MODEL_KIND=qwen` → use Qwen2.5‑VL 7B
|
| 27 |
+
|
| 28 |
+
The logic is implemented in `llama_runtime.py`.
|
| 29 |
+
|
| 30 |
+
## Files
|
| 31 |
+
|
| 32 |
+
- `Dockerfile`
|
| 33 |
+
Builds llama.cpp and llama-cpp-python from source, installs dependencies, and runs `app.py`.
|
| 34 |
+
|
| 35 |
+
- `requirements.txt`
|
| 36 |
+
Python dependencies (Gradio, Pillow, PDF, HF Hub).
|
| 37 |
+
|
| 38 |
+
- `llama_runtime.py`
|
| 39 |
+
Shared runtime for Gemma and Qwen2.5‑VL:
|
| 40 |
+
- Auto-downloads GGUF + mmproj to `/tmp`
|
| 41 |
+
- Uses `vision_model_path` for Gemma
|
| 42 |
+
- Uses `Qwen2VLChatHandler` for Qwen2.5‑VL if available, otherwise a safe fallback
|
| 43 |
+
- Exposes `analyze_image(image, prompt)` and `health_check()`.
|
| 44 |
+
|
| 45 |
+
- `model_auto_download.py`
|
| 46 |
+
Optional script to prefetch all models at startup.
|
| 47 |
+
|
| 48 |
+
- `health.py`
|
| 49 |
+
Minimal FastAPI app exposing `/health` endpoint using `health_check()`.
|
| 50 |
+
|
| 51 |
+
## Health check
|
| 52 |
+
|
| 53 |
+
- Internal: call `health_check()` from `llama_runtime.py` to verify the model can be loaded.
|
| 54 |
+
- HTTP: run `health.py` (FastAPI) and query `GET /health` to get `{ "status": "ok" }` when the model is ready.
|
| 55 |
+
|
| 56 |
+
## Notes
|
| 57 |
+
|
| 58 |
+
- All heavy compilation happens in the Docker build stage.
|
| 59 |
+
- No GPUs are required; everything runs on CPU.
|
| 60 |
+
- You can extend `llama_runtime.py` to add more models or switch quantization.
|
| 61 |
|
| 62 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|