Converse_AI

Sleeping

App Files Files Community

Eyadddddddd commited on Jan 7

Commit

b2a8bb2

verified ·

1 Parent(s): a7b0e11

Update README.md

Browse files

Files changed (1) hide show

README.md +53 -3

README.md CHANGED Viewed

@@ -1,12 +1,62 @@
 ---
-title: Chatbot
-emoji: 🏆
-colorFrom: yellow
 colorTo: purple
 sdk: docker
 sdk_version: 6.1.0
 app_file: app.py
 pinned: false
 ---
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: NeoHelper
+emoji: 🚀
+colorFrom: blue
 colorTo: purple
 sdk: docker
 sdk_version: 6.1.0
 app_file: app.py
 pinned: false
 ---
+---
+# Neo Vision Space (Gemma 3 / Qwen2.5‑VL, CPU, Docker)
+This Space runs **llama.cpp built from source** inside a **Custom Docker Space** and supports:
+- ✅ Gemma 3 4B Vision (`ggml-org/gemma-3-4b-it-GGUF`)
+- ✅ Qwen2.5‑VL 7B (`unsloth/Qwen2.5-VL-7B-Instruct-GGUF`)
+- ✅ CPU‑only, no GPUs
+- ✅ No prebuilt wheels — everything compiled in the Dockerfile
+## Model selection
+Set the environment variable:
+- `MODEL_KIND=gemma` → use Gemma 3 4B Vision
+- `MODEL_KIND=qwen` → use Qwen2.5‑VL 7B
+The logic is implemented in `llama_runtime.py`.
+## Files
+- `Dockerfile`
+  Builds llama.cpp and llama-cpp-python from source, installs dependencies, and runs `app.py`.
+- `requirements.txt`
+  Python dependencies (Gradio, Pillow, PDF, HF Hub).
+- `llama_runtime.py`
+  Shared runtime for Gemma and Qwen2.5‑VL:
+  - Auto-downloads GGUF + mmproj to `/tmp`
+  - Uses `vision_model_path` for Gemma
+  - Uses `Qwen2VLChatHandler` for Qwen2.5‑VL if available, otherwise a safe fallback
+  - Exposes `analyze_image(image, prompt)` and `health_check()`.
+- `model_auto_download.py`
+  Optional script to prefetch all models at startup.
+- `health.py`
+  Minimal FastAPI app exposing `/health` endpoint using `health_check()`.
+## Health check
+- Internal: call `health_check()` from `llama_runtime.py` to verify the model can be loaded.
+- HTTP: run `health.py` (FastAPI) and query `GET /health` to get `{ "status": "ok" }` when the model is ready.
+## Notes
+- All heavy compilation happens in the Docker build stage.
+- No GPUs are required; everything runs on CPU.
+- You can extend `llama_runtime.py` to add more models or switch quantization.
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference