machiabeli commited on
Commit
de407cc
·
verified ·
1 Parent(s): 5f88be3

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +5 -6
README.md CHANGED
@@ -38,13 +38,12 @@ mlx_lm.generate \
38
  - **Architecture:** Dense MLA (Multi-head Latent Attention)
39
  - **Framework:** MLX (Apple Silicon optimized)
40
 
41
- ## Performance
42
 
43
- | Metric | Value |
44
- |--------|-------|
45
- | Size | 1.2GB |
46
- | Speed | ~209 tokens/sec |
47
- | Peak Memory | ~1.4GB |
48
 
49
  ## Features
50
 
 
38
  - **Architecture:** Dense MLA (Multi-head Latent Attention)
39
  - **Framework:** MLX (Apple Silicon optimized)
40
 
41
+ ## Performance (M3 Ultra)
42
 
43
+ | Quant | Prompt | Generation | Memory |
44
+ |-------|--------|------------|--------|
45
+ | bf16 | 118 tok/s | 112 tok/s | 4.7GB |
46
+ | 4-bit | 202 tok/s | 205 tok/s | 1.3GB |
 
47
 
48
  ## Features
49