ZeroEngine / README.md
turtle170's picture
Update README.md
7046421 verified

A newer version of the Gradio SDK is available: 6.5.1

Upgrade
metadata
title: ZeroEngine V0.2
emoji: πŸš€
colorFrom: gray
colorTo: gray
sdk: gradio
sdk_version: 6.5.0
app_file: app.py
pinned: false
license: apache-2.0
python_version: 3.11
hf_oauth: true
hf_oauth_scopes:
  - read-repos
  - email

πŸ›°οΈ ZeroEngine V0.1

ZeroEngine is a high-efficiency inference platform designed to push the limits of low-tier hardware. It demonstrates that with aggressive optimization, even a standard 2 vCPU instance can provide a responsive LLM experience.

πŸš€ Key Features

  • Zero-Config GGUF Loading: Scan and boot any compatible repository directly from the Hub.
  • Ghost Cache System: Background tokenization and KV-cache priming for near-instant execution.
  • Resource Stewardship: Integrated "Inactivity Session Killer" and 3-pass GC to ensure high availability on shared hardware.

πŸ› οΈ Usage

  1. Target Repo: Enter a Hugging Face model repository (e.g., unsloth/Llama-3.2-1B-GGUF).
    • Note: On current 2 vCPU hardware, models >4B are not recommended.
  2. Scan: Click SCAN to fetch available .gguf quants.
  3. Select Quant: Choose your preferred file. (Recommendation: Q4_K_M for the optimal balance of speed and logic).
  4. Initialize: Click BOOT to load the model into the kernel.
  5. Execute: Start chatting. The engine pre-processes your input into tensors while you type.

βš–οΈ Current Limitations

  • Concurrency: To maintain performance, vCPU slots are strictly managed. If the system is full, you will be placed in a queue.
  • Inactivity Timeout: Users are automatically rotated out of the active slot after 20 seconds of inactivity to free resources for the community.
  • Hardware Bottleneck: On the base 2 vCPU tier, expect 1-5 TPS for BF16 models and 6-12 TPS for optimized quants.

πŸ—οΈ Technical Stack

  • Inference: llama-cpp-python
  • Frontend: Gradio 6.5.0
  • Telemetry: Custom JSON-based resource monitoring
  • License: Apache 2.0

ZeroEngine is a personal open-source project dedicated to making LLM inference accessible on minimal hardware.