Spaces:
Running
Running
| title: ZeroEngine V0.2 | |
| emoji: π | |
| colorFrom: gray | |
| colorTo: gray | |
| sdk: gradio | |
| sdk_version: 6.5.0 | |
| app_file: app.py | |
| pinned: false | |
| license: apache-2.0 | |
| python_version: 3.11 | |
| hf_oauth: true | |
| hf_oauth_scopes: | |
| - read-repos | |
| # π°οΈ ZeroEngine V0.1 | |
| **ZeroEngine** is a high-efficiency inference platform designed to push the limits of low-tier hardware. It demonstrates that with aggressive optimization, even a standard 2 vCPU instance can provide a responsive LLM experience. | |
| ## π Key Features | |
| - **Zero-Config GGUF Loading:** Scan and boot any compatible repository directly from the Hub. | |
| - **Ghost Cache System:** Background tokenization and KV-cache priming for near-instant execution. | |
| - **Resource Stewardship:** Integrated "Inactivity Session Killer" and 3-pass GC to ensure high availability on shared hardware. | |
| ## π οΈ Usage | |
| 1. **Target Repo:** Enter a Hugging Face model repository (e.g., `unsloth/Llama-3.2-1B-GGUF`). | |
| - *Note: On current 2 vCPU hardware, models >4B are not recommended.* | |
| 2. **Scan:** Click **SCAN** to fetch available `.gguf` quants. | |
| 3. **Select Quant:** Choose your preferred file. (Recommendation: `Q4_K_M` for the optimal balance of speed and logic). | |
| 4. **Initialize:** Click **BOOT** to load the model into the kernel. | |
| 5. **Execute:** Start chatting. The engine pre-processes your input into tensors while you type. | |
| ## βοΈ Current Limitations | |
| - **Concurrency:** To maintain performance, vCPU slots are strictly managed. If the system is full, you will be placed in a queue. | |
| - **Inactivity Timeout:** Users are automatically rotated out of the active slot after **20 seconds of inactivity** to free resources for the community. | |
| - **Hardware Bottleneck:** On the base 2 vCPU tier, expect 1-5 TPS for BF16 models and 6-12 TPS for optimized quants. | |
| ## ποΈ Technical Stack | |
| - **Inference:** `llama-cpp-python` | |
| - **Frontend:** `Gradio 6.5.0` | |
| - **Telemetry:** Custom JSON-based resource monitoring | |
| - **License:** Apache 2.0 | |
| --- | |
| *ZeroEngine is a personal open-source project dedicated to making LLM inference accessible on minimal hardware.* |