Spaces:

turtle170
/

ZeroEngine

Sleeping

turtle170 commited on 21 days ago

Commit

1a132e5

verified ·

1 Parent(s): 7ca413a

Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -11,10 +11,17 @@ license: apache-2.0
 python_version: 3.11
 ---
-# ZeroEngine V0.1 (Kernel)
-High-performance inference engine for 2-vCPU / 16GB RAM constraints.
-## Optimizations
-- **KV-Cache Stitching**: Asynchronous pre-evaluation of queue inputs.
-- **Hard Partitioning**: Dedicated core assignment per concurrent user.
-- **Memory Mapping**: weights mapped via `mmap` to preserve RAM for context.

 python_version: 3.11
 ---
+## Overview:
+ZeroEngine is designed to demonstrate how low-tier hardware like the 2 vCPU instance provided by HF can run various models with ease.
+## Usage
+1. Enter your model repo (e.g. unsloth/gemma-3-1b-it-GGUF) [CAUTION: since ZeroEngine is running on low-tier hardware, It cannot run big models >4B.]
+2. Click 'SCAN' to get all the .gguf files of that repo.
+3. Click your preferred file (Q4_K_M has the best performance, at about 6-12 Tokens Per Second.)
+4. Select 'BOOT' to load your model.
+5. Start chatting! The engine automatically pre-processes your enquiry into a tensor, speeding up everything.
+## Limitations
+1. You might have to queue, as only 2 vCPUs are available. As we prioritise performance, a vCPU is assigned to a active user. There may be 2 active users at the same time. After one of the idles for >=20 seconds, they will be automatically kicked into the queue, freeing a slot.
+2. As the engine runs on low-tier hardware, expect 1-5 TPS on BF16 models.
+3. As the engine uses a shared template, some models like Gemma 3 would not work.