Update README.md
Browse files
README.md
CHANGED
|
@@ -131,9 +131,19 @@ In interactive mode (`-i`), simply paste your question and press Enter. The chat
|
|
| 131 |
|
| 132 |
## Performance (CPU)
|
| 133 |
|
|
|
|
|
|
|
|
|
|
| 134 |
- Q4_0 on AMD Ryzen 5 5600G with Radeon Graphics (3.90 GHz): ~35 tokens/second (output), measured in a typical chat generation scenario.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 135 |
- Actual throughput varies with prompt length, context size, threads, OS, and build flags.
|
| 136 |
-
|
| 137 |
|
| 138 |
## Which file should I choose?
|
| 139 |
|
|
|
|
| 131 |
|
| 132 |
## Performance (CPU)
|
| 133 |
|
| 134 |
+
```bash
|
| 135 |
+
./build/bin/main -m ling-mini-2.0-q4.bin --seed 1"
|
| 136 |
+
```
|
| 137 |
- Q4_0 on AMD Ryzen 5 5600G with Radeon Graphics (3.90 GHz): ~35 tokens/second (output), measured in a typical chat generation scenario.
|
| 138 |
+
|
| 139 |
+
```bash
|
| 140 |
+
./build/bin/main -m ling-mini-2.0-q8.bin --seed 1"
|
| 141 |
+
```
|
| 142 |
+
- Q8_0 on AMD Ryzen 5 5600G with Radeon Graphics (3.90 GHz): ~20 tokens/second (output), measured in a typical chat generation scenario.
|
| 143 |
+
|
| 144 |
+
Notes:
|
| 145 |
- Actual throughput varies with prompt length, context size, threads, OS, and build flags.
|
| 146 |
+
|
| 147 |
|
| 148 |
## Which file should I choose?
|
| 149 |
|