File size: 1,490 Bytes
1d20f66
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
---
tags:
- mlx
- apple-silicon
- text-generation
- gemma3
- instruction-tuned
library_name: mlx-lm
pipeline_tag: text-generation
base_model: Daizee/Dirty-Calla-4B
license: apache-2.0
---

# 🖤 Dirty-Calla-4B — **MLX** builds for Apple Silicon

**Dirty-Calla-4B-mlx** provides Apple Silicon–optimized versions of **Daizee/Dirty-Calla-4B**, a fine-tuned **Gemma 3 (4B)** model developed by **Daizee** for expressive, humanlike, and emotionally textured responses.

This conversion uses Apple’s **MLX** framework for local inference on **M1, M2, and M3 Macs**.  
Each variant trades size for speed or precision, so you can choose what fits your workflow.

> 🧩 **Note on vocab padding:**  
> The tokenizer and embedding matrix were padded to the next multiple of 64 tokens (262,208 total).  
> Added tokens are labeled `<pad_ex_*>` — they will not appear in normal generations.

---

## ⚙️ Variants

| Folder        | Bits | Group Size | Description |
|----------------|------|------------|--------------|
| `mlx/g128/`   | int4 | 128 | Smallest & fastest (lightest memory use) |
| `mlx/g64/`    | int4 | 64  | Balanced: slightly slower, more stable |
| `mlx/int8/`   | int8 | —   | Closest to fp16 precision, best coherence |

---

## 🚀 Quickstart

### Run directly from Hugging Face
```bash
python -m mlx_lm.generate \
  --model hf://Daizee/Dirty-Calla-4B-mlx/mlx/g64 \
  --prompt "Describe a rainy city from the perspective of a poet." \
  --max-tokens 150 --temp 0.4