Spaces:

build-small-hackathon
/

First-Principle-AI

Paused

App Files Files Community

First-Principle-AI / README.md

owenisas

Restore easy benchmark examples

d01c43d verified 6 days ago

preview code

raw

history blame contribute delete

2.42 kB

	---
	title: "First-Principle AI"
	emoji: "⚙️"
	colorFrom: gray
	colorTo: blue
	sdk: gradio
	sdk_version: "6.14.0"
	python_version: "3.12"
	app_file: app.py
	fullWidth: true
	header: mini
	short_description: "Phase-3 Q8 GGUF lab console with llama.cpp."
	suggested_hardware: zero-a10g
	models:
	- build-small-hackathon/phase-3-gguf
	tags:
	- gradio
	- zerogpu
	- llama-cpp
	- gguf
	- chatbot
	- model-lab
	- build-small-hackathon
	license: mit
	---

	# First-Principle AI

	First-Principle AI is a compact Gradio console for running and probing the
	`build-small-hackathon/phase-3-gguf` Q8 GGUF model through
	the official `llama.cpp` Ubuntu `llama-server` release.

	The UI includes benchmark-style examples inspired by common LLM evaluation
	areas: math reasoning, commonsense, science QA, truthfulness, instruction
	following, coding, logic, summarization, extraction, robustness, and
	goal-binding prompts where the model must identify which real-world object
	needs to move. The questions are original prompts, not copied benchmark items.

	## Runtime Notes

	- Model repo: `build-small-hackathon/phase-3-gguf`
	- Model file: `model-Q8_0.gguf`
	- Runtime: official `llama.cpp` `llama-server`
	- Hardware target: ZeroGPU
	- Fallback behavior: visible runtime diagnostics instead of silent mock output
	- Model loading: runtime download/load through a persistent `llama-server`
	- Default llama.cpp settings: `n_ctx=2048`, `n_batch=256`, `n_ubatch=64`,
	memory-mapped weights, no warmup, and CPU fallback if CUDA offload is unavailable

	ZeroGPU is a Gradio dynamic GPU runtime primarily documented around PyTorch
	workloads. This app targets ZeroGPU as requested, but it runs the GGUF through
	the official llama.cpp CLI path so it does not depend on a Python extension
	compile during the Space build. If the runtime does not expose enough memory or
	a compatible llama.cpp binary, the app returns a visible compatibility message.

	The model is intentionally not preloaded during the Space build because the Q8
	GGUF is 33.6 GB and can make build startup unreliable. The app resolves the Hub
	file at runtime after checking memory and runtime compatibility. The first
	prompt may take several minutes while the model downloads and initializes;
	subsequent prompts reuse the in-process llama.cpp model.

	## Local Smoke Test

	```bash
	cd /Users/user/Documents/Automation-agents/hf-spaces/phase-3-gguf-lab
	PHASE3_DISABLE_MODEL=1 python app.py
	```