Spaces:

turtle170
/

ZeroEngine

Running

App Files Files Community

ZeroEngine / README.md

turtle170

Update README.md

7046421 verified 3 days ago

preview code

raw

history blame contribute delete

2.09 kB

A newer version of the Gradio SDK is available: 6.5.1

Upgrade

metadata

title: ZeroEngine V0.2
emoji: 🚀
colorFrom: gray
colorTo: gray
sdk: gradio
sdk_version: 6.5.0
app_file: app.py
pinned: false
license: apache-2.0
python_version: 3.11
hf_oauth: true
hf_oauth_scopes:
  - read-repos
  - email

🛰️ ZeroEngine V0.1

ZeroEngine is a high-efficiency inference platform designed to push the limits of low-tier hardware. It demonstrates that with aggressive optimization, even a standard 2 vCPU instance can provide a responsive LLM experience.

🚀 Key Features

Zero-Config GGUF Loading: Scan and boot any compatible repository directly from the Hub.
Ghost Cache System: Background tokenization and KV-cache priming for near-instant execution.
Resource Stewardship: Integrated "Inactivity Session Killer" and 3-pass GC to ensure high availability on shared hardware.

🛠️ Usage

Target Repo: Enter a Hugging Face model repository (e.g., unsloth/Llama-3.2-1B-GGUF).
- Note: On current 2 vCPU hardware, models >4B are not recommended.
Scan: Click SCAN to fetch available .gguf quants.
Select Quant: Choose your preferred file. (Recommendation: Q4_K_M for the optimal balance of speed and logic).
Initialize: Click BOOT to load the model into the kernel.
Execute: Start chatting. The engine pre-processes your input into tensors while you type.

⚖️ Current Limitations

Concurrency: To maintain performance, vCPU slots are strictly managed. If the system is full, you will be placed in a queue.
Inactivity Timeout: Users are automatically rotated out of the active slot after 20 seconds of inactivity to free resources for the community.
Hardware Bottleneck: On the base 2 vCPU tier, expect 1-5 TPS for BF16 models and 6-12 TPS for optimized quants.

🏗️ Technical Stack

Inference: llama-cpp-python
Frontend: Gradio 6.5.0
Telemetry: Custom JSON-based resource monitoring
License: Apache 2.0

ZeroEngine is a personal open-source project dedicated to making LLM inference accessible on minimal hardware.