Spaces:
Running
Running
File size: 2,092 Bytes
2838f15 7046421 ddd856f c9c4656 ddd856f 2838f15 111b6d9 7046421 2838f15 5ed5fed ddd856f 5ed5fed 1a132e5 5ed5fed |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
---
title: ZeroEngine V0.2
emoji: π
colorFrom: gray
colorTo: gray
sdk: gradio
sdk_version: 6.5.0
app_file: app.py
pinned: false
license: apache-2.0
python_version: 3.11
hf_oauth: true
hf_oauth_scopes:
- read-repos
- email
---
# π°οΈ ZeroEngine V0.1
**ZeroEngine** is a high-efficiency inference platform designed to push the limits of low-tier hardware. It demonstrates that with aggressive optimization, even a standard 2 vCPU instance can provide a responsive LLM experience.
## π Key Features
- **Zero-Config GGUF Loading:** Scan and boot any compatible repository directly from the Hub.
- **Ghost Cache System:** Background tokenization and KV-cache priming for near-instant execution.
- **Resource Stewardship:** Integrated "Inactivity Session Killer" and 3-pass GC to ensure high availability on shared hardware.
## π οΈ Usage
1. **Target Repo:** Enter a Hugging Face model repository (e.g., `unsloth/Llama-3.2-1B-GGUF`).
- *Note: On current 2 vCPU hardware, models >4B are not recommended.*
2. **Scan:** Click **SCAN** to fetch available `.gguf` quants.
3. **Select Quant:** Choose your preferred file. (Recommendation: `Q4_K_M` for the optimal balance of speed and logic).
4. **Initialize:** Click **BOOT** to load the model into the kernel.
5. **Execute:** Start chatting. The engine pre-processes your input into tensors while you type.
## βοΈ Current Limitations
- **Concurrency:** To maintain performance, vCPU slots are strictly managed. If the system is full, you will be placed in a queue.
- **Inactivity Timeout:** Users are automatically rotated out of the active slot after **20 seconds of inactivity** to free resources for the community.
- **Hardware Bottleneck:** On the base 2 vCPU tier, expect 1-5 TPS for BF16 models and 6-12 TPS for optimized quants.
## ποΈ Technical Stack
- **Inference:** `llama-cpp-python`
- **Frontend:** `Gradio 6.5.0`
- **Telemetry:** Custom JSON-based resource monitoring
- **License:** Apache 2.0
---
*ZeroEngine is a personal open-source project dedicated to making LLM inference accessible on minimal hardware.* |