Daizee
/

Dirty-Calla-4B-mlx

Text Generation

instruction-tuned

Model card Files Files and versions

Dirty-Calla-4B-mlx / README.md

Daizee's picture

Create README.md

1d20f66 verified 3 months ago

|

history blame contribute delete

1.49 kB

	---
	tags:
	- mlx
	- apple-silicon
	- text-generation
	- gemma3
	- instruction-tuned
	library_name: mlx-lm
	pipeline_tag: text-generation
	base_model: Daizee/Dirty-Calla-4B
	license: apache-2.0
	---

	# 🖤 Dirty-Calla-4B — MLX builds for Apple Silicon

	Dirty-Calla-4B-mlx provides Apple Silicon–optimized versions of Daizee/Dirty-Calla-4B, a fine-tuned Gemma 3 (4B) model developed by Daizee for expressive, humanlike, and emotionally textured responses.

	This conversion uses Apple’s MLX framework for local inference on M1, M2, and M3 Macs.
	Each variant trades size for speed or precision, so you can choose what fits your workflow.

	> 🧩 Note on vocab padding:
	> The tokenizer and embedding matrix were padded to the next multiple of 64 tokens (262,208 total).
	> Added tokens are labeled `<pad_ex_*>` — they will not appear in normal generations.

	---

	## ⚙️ Variants

	\| Folder \| Bits \| Group Size \| Description \|
	\|----------------\|------\|------------\|--------------\|
	\| `mlx/g128/` \| int4 \| 128 \| Smallest & fastest (lightest memory use) \|
	\| `mlx/g64/` \| int4 \| 64 \| Balanced: slightly slower, more stable \|
	\| `mlx/int8/` \| int8 \| — \| Closest to fp16 precision, best coherence \|

	---

	## 🚀 Quickstart

	### Run directly from Hugging Face
	```bash
	python -m mlx_lm.generate \
	--model hf://Daizee/Dirty-Calla-4B-mlx/mlx/g64 \
	--prompt "Describe a rainy city from the perspective of a poet." \
	--max-tokens 150 --temp 0.4