Spaces:
Sleeping
Sleeping
Update README.md
Browse files
README.md
CHANGED
|
@@ -11,10 +11,17 @@ license: apache-2.0
|
|
| 11 |
python_version: 3.11
|
| 12 |
---
|
| 13 |
|
| 14 |
-
|
| 15 |
-
|
| 16 |
|
| 17 |
-
##
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
python_version: 3.11
|
| 12 |
---
|
| 13 |
|
| 14 |
+
## Overview:
|
| 15 |
+
ZeroEngine is designed to demonstrate how low-tier hardware like the 2 vCPU instance provided by HF can run various models with ease.
|
| 16 |
|
| 17 |
+
## Usage
|
| 18 |
+
1. Enter your model repo (e.g. unsloth/gemma-3-1b-it-GGUF) [CAUTION: since ZeroEngine is running on low-tier hardware, It cannot run big models >4B.]
|
| 19 |
+
2. Click 'SCAN' to get all the .gguf files of that repo.
|
| 20 |
+
3. Click your preferred file (Q4_K_M has the best performance, at about 6-12 Tokens Per Second.)
|
| 21 |
+
4. Select 'BOOT' to load your model.
|
| 22 |
+
5. Start chatting! The engine automatically pre-processes your enquiry into a tensor, speeding up everything.
|
| 23 |
+
|
| 24 |
+
## Limitations
|
| 25 |
+
1. You might have to queue, as only 2 vCPUs are available. As we prioritise performance, a vCPU is assigned to a active user. There may be 2 active users at the same time. After one of the idles for >=20 seconds, they will be automatically kicked into the queue, freeing a slot.
|
| 26 |
+
2. As the engine runs on low-tier hardware, expect 1-5 TPS on BF16 models.
|
| 27 |
+
3. As the engine uses a shared template, some models like Gemma 3 would not work.
|