mlx-community
/

Youtu-LLM-2B-4bit

Text Generation

4-bit precision

Model card Files Files and versions

machiabeli commited on Jan 1

Commit

de407cc

·

verified ·

1 Parent(s): 5f88be3

Upload README.md with huggingface_hub

Files changed (1) hide show

README.md +5 -6

README.md CHANGED Viewed

@@ -38,13 +38,12 @@ mlx_lm.generate \
 - **Architecture:** Dense MLA (Multi-head Latent Attention)
 - **Framework:** MLX (Apple Silicon optimized)
-## Performance
-| Metric | Value |
-|--------|-------|
-| Size | 1.2GB |
-| Speed | ~209 tokens/sec |
-| Peak Memory | ~1.4GB |
 ## Features

 - **Architecture:** Dense MLA (Multi-head Latent Attention)
 - **Framework:** MLX (Apple Silicon optimized)
+## Performance (M3 Ultra)
+| Quant | Prompt | Generation | Memory |
+|-------|--------|------------|--------|
+| bf16 | 118 tok/s | 112 tok/s | 4.7GB |
+| 4-bit | 202 tok/s | 205 tok/s | 1.3GB |
 ## Features