--- title: ZeroEngine V0.2 emoji: 🚀 colorFrom: gray colorTo: gray sdk: gradio sdk_version: 6.5.0 app_file: app.py pinned: false license: apache-2.0 python_version: 3.11 hf_oauth: true hf_oauth_scopes: - read-repos - email --- # 🛰️ ZeroEngine V0.1 **ZeroEngine** is a high-efficiency inference platform designed to push the limits of low-tier hardware. It demonstrates that with aggressive optimization, even a standard 2 vCPU instance can provide a responsive LLM experience. ## 🚀 Key Features - **Zero-Config GGUF Loading:** Scan and boot any compatible repository directly from the Hub. - **Ghost Cache System:** Background tokenization and KV-cache priming for near-instant execution. - **Resource Stewardship:** Integrated "Inactivity Session Killer" and 3-pass GC to ensure high availability on shared hardware. ## 🛠️ Usage 1. **Target Repo:** Enter a Hugging Face model repository (e.g., `unsloth/Llama-3.2-1B-GGUF`). - *Note: On current 2 vCPU hardware, models >4B are not recommended.* 2. **Scan:** Click **SCAN** to fetch available `.gguf` quants. 3. **Select Quant:** Choose your preferred file. (Recommendation: `Q4_K_M` for the optimal balance of speed and logic). 4. **Initialize:** Click **BOOT** to load the model into the kernel. 5. **Execute:** Start chatting. The engine pre-processes your input into tensors while you type. ## ⚖️ Current Limitations - **Concurrency:** To maintain performance, vCPU slots are strictly managed. If the system is full, you will be placed in a queue. - **Inactivity Timeout:** Users are automatically rotated out of the active slot after **20 seconds of inactivity** to free resources for the community. - **Hardware Bottleneck:** On the base 2 vCPU tier, expect 1-5 TPS for BF16 models and 6-12 TPS for optimized quants. ## 🏗️ Technical Stack - **Inference:** `llama-cpp-python` - **Frontend:** `Gradio 6.5.0` - **Telemetry:** Custom JSON-based resource monitoring - **License:** Apache 2.0 --- *ZeroEngine is a personal open-source project dedicated to making LLM inference accessible on minimal hardware.*