|
|
--- |
|
|
tags: |
|
|
- mlx |
|
|
- apple-silicon |
|
|
- text-generation |
|
|
- gemma3 |
|
|
- instruction-tuned |
|
|
library_name: mlx-lm |
|
|
pipeline_tag: text-generation |
|
|
base_model: Daizee/Dirty-Calla-4B |
|
|
license: apache-2.0 |
|
|
--- |
|
|
|
|
|
# 🖤 Dirty-Calla-4B — **MLX** builds for Apple Silicon |
|
|
|
|
|
**Dirty-Calla-4B-mlx** provides Apple Silicon–optimized versions of **Daizee/Dirty-Calla-4B**, a fine-tuned **Gemma 3 (4B)** model developed by **Daizee** for expressive, humanlike, and emotionally textured responses. |
|
|
|
|
|
This conversion uses Apple’s **MLX** framework for local inference on **M1, M2, and M3 Macs**. |
|
|
Each variant trades size for speed or precision, so you can choose what fits your workflow. |
|
|
|
|
|
> 🧩 **Note on vocab padding:** |
|
|
> The tokenizer and embedding matrix were padded to the next multiple of 64 tokens (262,208 total). |
|
|
> Added tokens are labeled `<pad_ex_*>` — they will not appear in normal generations. |
|
|
|
|
|
--- |
|
|
|
|
|
## ⚙️ Variants |
|
|
|
|
|
| Folder | Bits | Group Size | Description | |
|
|
|----------------|------|------------|--------------| |
|
|
| `mlx/g128/` | int4 | 128 | Smallest & fastest (lightest memory use) | |
|
|
| `mlx/g64/` | int4 | 64 | Balanced: slightly slower, more stable | |
|
|
| `mlx/int8/` | int8 | — | Closest to fp16 precision, best coherence | |
|
|
|
|
|
--- |
|
|
|
|
|
## 🚀 Quickstart |
|
|
|
|
|
### Run directly from Hugging Face |
|
|
```bash |
|
|
python -m mlx_lm.generate \ |
|
|
--model hf://Daizee/Dirty-Calla-4B-mlx/mlx/g64 \ |
|
|
--prompt "Describe a rainy city from the perspective of a poet." \ |
|
|
--max-tokens 150 --temp 0.4 |
|
|
|