Upload model file
Browse files
README.md
CHANGED
|
@@ -12,7 +12,7 @@ pipeline_tag: text-generation
|
|
| 12 |
|
| 13 |
#### Tested on a M3 Ultra 512GB RAM using [Inferencer app v1.10](https://inferencer.com)
|
| 14 |
- Single inference ~36.5 tokens/s @ 1000 tokens
|
| 15 |
-
- Batched inference ~ total tokens/s across
|
| 16 |
- Memory usage: ~239 GiB
|
| 17 |
|
| 18 |
*q9bit quant typically achieves near lossless accuracy in our coding test*
|
|
|
|
| 12 |
|
| 13 |
#### Tested on a M3 Ultra 512GB RAM using [Inferencer app v1.10](https://inferencer.com)
|
| 14 |
- Single inference ~36.5 tokens/s @ 1000 tokens
|
| 15 |
+
- Batched inference ~44 total tokens/s across two inferences
|
| 16 |
- Memory usage: ~239 GiB
|
| 17 |
|
| 18 |
*q9bit quant typically achieves near lossless accuracy in our coding test*
|