machiabeli commited on
Commit
a8adb95
·
verified ·
1 Parent(s): bd7067d

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +5 -6
README.md CHANGED
@@ -36,13 +36,12 @@ mlx_lm.generate \
36
  - **Architecture:** Dense MLA (Multi-head Latent Attention)
37
  - **Framework:** MLX (Apple Silicon optimized)
38
 
39
- ## Performance
40
 
41
- | Metric | Value |
42
- |--------|-------|
43
- | Size | 4.4GB |
44
- | Speed | ~113 tokens/sec |
45
- | Peak Memory | ~4.6GB |
46
 
47
  ## Features
48
 
 
36
  - **Architecture:** Dense MLA (Multi-head Latent Attention)
37
  - **Framework:** MLX (Apple Silicon optimized)
38
 
39
+ ## Performance (M3 Ultra)
40
 
41
+ | Quant | Prompt | Generation | Memory |
42
+ |-------|--------|------------|--------|
43
+ | bf16 | 118 tok/s | 112 tok/s | 4.7GB |
44
+ | 4-bit | 202 tok/s | 205 tok/s | 1.3GB |
 
45
 
46
  ## Features
47