Apply for a GPU community grant: Personal project

#1
by turtle170 - opened

Hi @hysts ,

I am requesting a ZeroGPU grant for this space: turtle170/ZeroEngine.

Project Focus

ZeroEngine demonstrates high-efficiency LLM orchestration. I have already successfully optimized the kernel to run GGUF models on the base 2 vCPU tier, but hardware is now the primary bottleneck for user experience.

Hardware Justification

While the 2 vCPU build works, the inference speed and queue times limit its utility as a community tool.

  • Current Limitation: Capped at small 1B-3B models with significant latency (currently seeing 1-4 TPS on Unsloth BF16 Llama 3.2 1B).
  • ZeroGPU Goal: Upgrading will allow ZeroEngine to support 7B+ models with lightning-fast inference (predicted 50-200 TPS on a 7B Q4_K_M model), transforming it into a flagship GGUF runner for the Hub.

Technical Responsibility

I have built this engine to be a "polite neighbor" on shared hardware:

  • Aggressive Cleanup: ZeroEngine utilizes a "20s Inactivity Session Killer," which combines Python Garbage Collection with a specialized model unloading routine to ensure VRAM is released immediately after a session. The inactive user will also be kicked into queue to free up space for other users.
  • Optimization: Background tokenization and KV-cache priming are already implemented in a separate background handler (turtle170/ZeroEngine-Backend) to minimize active GPU residency time, ensuring we only occupy a GPU slice during active generation.

The space is fully refactored for Gradio 6.5.0 and is ready for @spaces.GPU deployment immediately. This grant will allow us to provide the community with a high-performance, zero-config way to explore GGUF models at scale.

Thank you for your consideration!

Sign up or log in to comment