Eyadddddddd commited on
Commit
b2a8bb2
·
verified ·
1 Parent(s): a7b0e11

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +53 -3
README.md CHANGED
@@ -1,12 +1,62 @@
1
  ---
2
- title: Chatbot
3
- emoji: 🏆
4
- colorFrom: yellow
5
  colorTo: purple
6
  sdk: docker
7
  sdk_version: 6.1.0
8
  app_file: app.py
9
  pinned: false
10
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
 
12
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
1
  ---
2
+ title: NeoHelper
3
+ emoji: 🚀
4
+ colorFrom: blue
5
  colorTo: purple
6
  sdk: docker
7
  sdk_version: 6.1.0
8
  app_file: app.py
9
  pinned: false
10
  ---
11
+ ---
12
+ # Neo Vision Space (Gemma 3 / Qwen2.5‑VL, CPU, Docker)
13
+
14
+ This Space runs **llama.cpp built from source** inside a **Custom Docker Space** and supports:
15
+
16
+ - ✅ Gemma 3 4B Vision (`ggml-org/gemma-3-4b-it-GGUF`)
17
+ - ✅ Qwen2.5‑VL 7B (`unsloth/Qwen2.5-VL-7B-Instruct-GGUF`)
18
+ - ✅ CPU‑only, no GPUs
19
+ - ✅ No prebuilt wheels — everything compiled in the Dockerfile
20
+
21
+ ## Model selection
22
+
23
+ Set the environment variable:
24
+
25
+ - `MODEL_KIND=gemma` → use Gemma 3 4B Vision
26
+ - `MODEL_KIND=qwen` → use Qwen2.5‑VL 7B
27
+
28
+ The logic is implemented in `llama_runtime.py`.
29
+
30
+ ## Files
31
+
32
+ - `Dockerfile`
33
+ Builds llama.cpp and llama-cpp-python from source, installs dependencies, and runs `app.py`.
34
+
35
+ - `requirements.txt`
36
+ Python dependencies (Gradio, Pillow, PDF, HF Hub).
37
+
38
+ - `llama_runtime.py`
39
+ Shared runtime for Gemma and Qwen2.5‑VL:
40
+ - Auto-downloads GGUF + mmproj to `/tmp`
41
+ - Uses `vision_model_path` for Gemma
42
+ - Uses `Qwen2VLChatHandler` for Qwen2.5‑VL if available, otherwise a safe fallback
43
+ - Exposes `analyze_image(image, prompt)` and `health_check()`.
44
+
45
+ - `model_auto_download.py`
46
+ Optional script to prefetch all models at startup.
47
+
48
+ - `health.py`
49
+ Minimal FastAPI app exposing `/health` endpoint using `health_check()`.
50
+
51
+ ## Health check
52
+
53
+ - Internal: call `health_check()` from `llama_runtime.py` to verify the model can be loaded.
54
+ - HTTP: run `health.py` (FastAPI) and query `GET /health` to get `{ "status": "ok" }` when the model is ready.
55
+
56
+ ## Notes
57
+
58
+ - All heavy compilation happens in the Docker build stage.
59
+ - No GPUs are required; everything runs on CPU.
60
+ - You can extend `llama_runtime.py` to add more models or switch quantization.
61
 
62
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference