Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -38,13 +38,12 @@ mlx_lm.generate \
|
|
| 38 |
- **Architecture:** Dense MLA (Multi-head Latent Attention)
|
| 39 |
- **Framework:** MLX (Apple Silicon optimized)
|
| 40 |
|
| 41 |
-
## Performance
|
| 42 |
|
| 43 |
-
|
|
| 44 |
-
|--------|-------|
|
| 45 |
-
|
|
| 46 |
-
|
|
| 47 |
-
| Peak Memory | ~1.4GB |
|
| 48 |
|
| 49 |
## Features
|
| 50 |
|
|
|
|
| 38 |
- **Architecture:** Dense MLA (Multi-head Latent Attention)
|
| 39 |
- **Framework:** MLX (Apple Silicon optimized)
|
| 40 |
|
| 41 |
+
## Performance (M3 Ultra)
|
| 42 |
|
| 43 |
+
| Quant | Prompt | Generation | Memory |
|
| 44 |
+
|-------|--------|------------|--------|
|
| 45 |
+
| bf16 | 118 tok/s | 112 tok/s | 4.7GB |
|
| 46 |
+
| 4-bit | 202 tok/s | 205 tok/s | 1.3GB |
|
|
|
|
| 47 |
|
| 48 |
## Features
|
| 49 |
|