Dirty-Calla-4B-mlx / README.md
Daizee's picture
Create README.md
1d20f66 verified
---
tags:
- mlx
- apple-silicon
- text-generation
- gemma3
- instruction-tuned
library_name: mlx-lm
pipeline_tag: text-generation
base_model: Daizee/Dirty-Calla-4B
license: apache-2.0
---
# 🖤 Dirty-Calla-4B — **MLX** builds for Apple Silicon
**Dirty-Calla-4B-mlx** provides Apple Silicon–optimized versions of **Daizee/Dirty-Calla-4B**, a fine-tuned **Gemma 3 (4B)** model developed by **Daizee** for expressive, humanlike, and emotionally textured responses.
This conversion uses Apple’s **MLX** framework for local inference on **M1, M2, and M3 Macs**.
Each variant trades size for speed or precision, so you can choose what fits your workflow.
> 🧩 **Note on vocab padding:**
> The tokenizer and embedding matrix were padded to the next multiple of 64 tokens (262,208 total).
> Added tokens are labeled `<pad_ex_*>` — they will not appear in normal generations.
---
## ⚙️ Variants
| Folder | Bits | Group Size | Description |
|----------------|------|------------|--------------|
| `mlx/g128/` | int4 | 128 | Smallest & fastest (lightest memory use) |
| `mlx/g64/` | int4 | 64 | Balanced: slightly slower, more stable |
| `mlx/int8/` | int8 | — | Closest to fp16 precision, best coherence |
---
## 🚀 Quickstart
### Run directly from Hugging Face
```bash
python -m mlx_lm.generate \
--model hf://Daizee/Dirty-Calla-4B-mlx/mlx/g64 \
--prompt "Describe a rainy city from the perspective of a poet." \
--max-tokens 150 --temp 0.4