Spaces:

turtle170
/

ZeroEngine

Running

App Files Files Community

ZeroEngine / README.md

turtle170

Update README.md

7046421 verified 3 days ago

preview code

raw

history blame contribute delete

2.09 kB

	---
	title: ZeroEngine V0.2
	emoji: 🚀
	colorFrom: gray
	colorTo: gray
	sdk: gradio
	sdk_version: 6.5.0
	app_file: app.py
	pinned: false
	license: apache-2.0
	python_version: 3.11
	hf_oauth: true
	hf_oauth_scopes:
	- read-repos
	- email
	---

	# 🛰️ ZeroEngine V0.1
	ZeroEngine is a high-efficiency inference platform designed to push the limits of low-tier hardware. It demonstrates that with aggressive optimization, even a standard 2 vCPU instance can provide a responsive LLM experience.

	## 🚀 Key Features
	- Zero-Config GGUF Loading: Scan and boot any compatible repository directly from the Hub.
	- Ghost Cache System: Background tokenization and KV-cache priming for near-instant execution.
	- Resource Stewardship: Integrated "Inactivity Session Killer" and 3-pass GC to ensure high availability on shared hardware.

	## 🛠️ Usage
	1. Target Repo: Enter a Hugging Face model repository (e.g., `unsloth/Llama-3.2-1B-GGUF`).
	- Note: On current 2 vCPU hardware, models >4B are not recommended.
	2. Scan: Click SCAN to fetch available `.gguf` quants.
	3. Select Quant: Choose your preferred file. (Recommendation: `Q4_K_M` for the optimal balance of speed and logic).
	4. Initialize: Click BOOT to load the model into the kernel.
	5. Execute: Start chatting. The engine pre-processes your input into tensors while you type.

	## ⚖️ Current Limitations
	- Concurrency: To maintain performance, vCPU slots are strictly managed. If the system is full, you will be placed in a queue.
	- Inactivity Timeout: Users are automatically rotated out of the active slot after 20 seconds of inactivity to free resources for the community.
	- Hardware Bottleneck: On the base 2 vCPU tier, expect 1-5 TPS for BF16 models and 6-12 TPS for optimized quants.

	## 🏗️ Technical Stack
	- Inference: `llama-cpp-python`
	- Frontend: `Gradio 6.5.0`
	- Telemetry: Custom JSON-based resource monitoring
	- License: Apache 2.0

	---
	ZeroEngine is a personal open-source project dedicated to making LLM inference accessible on minimal hardware.