| --- |
| title: "First-Principle AI" |
| emoji: "⚙️" |
| colorFrom: gray |
| colorTo: blue |
| sdk: gradio |
| sdk_version: "6.14.0" |
| python_version: "3.12" |
| app_file: app.py |
| fullWidth: true |
| header: mini |
| short_description: "Phase-3 Q8 GGUF lab console with llama.cpp." |
| suggested_hardware: zero-a10g |
| models: |
| - build-small-hackathon/phase-3-gguf |
| tags: |
| - gradio |
| - zerogpu |
| - llama-cpp |
| - gguf |
| - chatbot |
| - model-lab |
| - build-small-hackathon |
| license: mit |
| --- |
| |
| # First-Principle AI |
|
|
| First-Principle AI is a compact Gradio console for running and probing the |
| `build-small-hackathon/phase-3-gguf` Q8 GGUF model through |
| the official `llama.cpp` Ubuntu `llama-server` release. |
|
|
| The UI includes benchmark-style examples inspired by common LLM evaluation |
| areas: math reasoning, commonsense, science QA, truthfulness, instruction |
| following, coding, logic, summarization, extraction, robustness, and |
| goal-binding prompts where the model must identify which real-world object |
| needs to move. The questions are original prompts, not copied benchmark items. |
|
|
| ## Runtime Notes |
|
|
| - Model repo: `build-small-hackathon/phase-3-gguf` |
| - Model file: `model-Q8_0.gguf` |
| - Runtime: official `llama.cpp` `llama-server` |
| - Hardware target: ZeroGPU |
| - Fallback behavior: visible runtime diagnostics instead of silent mock output |
| - Model loading: runtime download/load through a persistent `llama-server` |
| - Default llama.cpp settings: `n_ctx=2048`, `n_batch=256`, `n_ubatch=64`, |
| memory-mapped weights, no warmup, and CPU fallback if CUDA offload is unavailable |
|
|
| ZeroGPU is a Gradio dynamic GPU runtime primarily documented around PyTorch |
| workloads. This app targets ZeroGPU as requested, but it runs the GGUF through |
| the official llama.cpp CLI path so it does not depend on a Python extension |
| compile during the Space build. If the runtime does not expose enough memory or |
| a compatible llama.cpp binary, the app returns a visible compatibility message. |
|
|
| The model is intentionally not preloaded during the Space build because the Q8 |
| GGUF is 33.6 GB and can make build startup unreliable. The app resolves the Hub |
| file at runtime after checking memory and runtime compatibility. The first |
| prompt may take several minutes while the model downloads and initializes; |
| subsequent prompts reuse the in-process llama.cpp model. |
|
|
| ## Local Smoke Test |
|
|
| ```bash |
| cd /Users/user/Documents/Automation-agents/hf-spaces/phase-3-gguf-lab |
| PHASE3_DISABLE_MODEL=1 python app.py |
| ``` |
|
|