inferencerlabs
/

MiniMax-M2.5-MLX-9bit

Text Generation

8-bit precision

Model card Files Files and versions

inferencerlabs commited on Feb 13

Commit

a59b138

·

verified ·

1 Parent(s): 48ac534

Upload model file

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -12,7 +12,7 @@ pipeline_tag: text-generation
 #### Tested on a M3 Ultra 512GB RAM using [Inferencer app v1.10](https://inferencer.com)
 - Single inference ~36.5 tokens/s @ 1000 tokens
-- Batched inference ~ total tokens/s across six inferences
 - Memory usage: ~239 GiB
 *q9bit quant typically achieves near lossless accuracy in our coding test*

 #### Tested on a M3 Ultra 512GB RAM using [Inferencer app v1.10](https://inferencer.com)
 - Single inference ~36.5 tokens/s @ 1000 tokens
+- Batched inference ~44 total tokens/s across two inferences
 - Memory usage: ~239 GiB
 *q9bit quant typically achieves near lossless accuracy in our coding test*