Spaces:

build-small-hackathon
/

First-Principle-AI

Paused

App Files Files Community

First-Principle-AI / README.md

owenisas

Restore easy benchmark examples

d01c43d verified 6 days ago

preview code

raw

history blame contribute delete

2.42 kB

A newer version of the Gradio SDK is available: 6.18.0

Upgrade

metadata

title: First-Principle AI
emoji: ⚙️
colorFrom: gray
colorTo: blue
sdk: gradio
sdk_version: 6.14.0
python_version: '3.12'
app_file: app.py
fullWidth: true
header: mini
short_description: Phase-3 Q8 GGUF lab console with llama.cpp.
suggested_hardware: zero-a10g
models:
  - build-small-hackathon/phase-3-gguf
tags:
  - gradio
  - zerogpu
  - llama-cpp
  - gguf
  - chatbot
  - model-lab
  - build-small-hackathon
license: mit

First-Principle AI

First-Principle AI is a compact Gradio console for running and probing the build-small-hackathon/phase-3-gguf Q8 GGUF model through the official llama.cpp Ubuntu llama-server release.

The UI includes benchmark-style examples inspired by common LLM evaluation areas: math reasoning, commonsense, science QA, truthfulness, instruction following, coding, logic, summarization, extraction, robustness, and goal-binding prompts where the model must identify which real-world object needs to move. The questions are original prompts, not copied benchmark items.

Runtime Notes

Model repo: build-small-hackathon/phase-3-gguf
Model file: model-Q8_0.gguf
Runtime: official llama.cpp llama-server
Hardware target: ZeroGPU
Fallback behavior: visible runtime diagnostics instead of silent mock output
Model loading: runtime download/load through a persistent llama-server
Default llama.cpp settings: n_ctx=2048, n_batch=256, n_ubatch=64, memory-mapped weights, no warmup, and CPU fallback if CUDA offload is unavailable

ZeroGPU is a Gradio dynamic GPU runtime primarily documented around PyTorch workloads. This app targets ZeroGPU as requested, but it runs the GGUF through the official llama.cpp CLI path so it does not depend on a Python extension compile during the Space build. If the runtime does not expose enough memory or a compatible llama.cpp binary, the app returns a visible compatibility message.

The model is intentionally not preloaded during the Space build because the Q8 GGUF is 33.6 GB and can make build startup unreliable. The app resolves the Hub file at runtime after checking memory and runtime compatibility. The first prompt may take several minutes while the model downloads and initializes; subsequent prompts reuse the in-process llama.cpp model.

Local Smoke Test

cd /Users/user/Documents/Automation-agents/hf-spaces/phase-3-gguf-lab
PHASE3_DISABLE_MODEL=1 python app.py