Instructions to use Daizee/Dirty-Calla-4B-mlx with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use Daizee/Dirty-Calla-4B-mlx with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # if on a CUDA device, also pip install mlx[cuda] # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("Daizee/Dirty-Calla-4B-mlx") prompt = "Once upon a time in" text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
- MLX LM
How to use Daizee/Dirty-Calla-4B-mlx with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Generate some text mlx_lm.generate --model "Daizee/Dirty-Calla-4B-mlx" --prompt "Once upon a time"
π€ Dirty-Calla-4B β MLX builds for Apple Silicon
Dirty-Calla-4B-mlx provides Apple Siliconβoptimized versions of Daizee/Dirty-Calla-4B, a fine-tuned Gemma 3 (4B) model developed by Daizee for expressive, humanlike, and emotionally textured responses.
This conversion uses Appleβs MLX framework for local inference on M1, M2, and M3 Macs.
Each variant trades size for speed or precision, so you can choose what fits your workflow.
π§© Note on vocab padding:
The tokenizer and embedding matrix were padded to the next multiple of 64 tokens (262,208 total).
Added tokens are labeled<pad_ex_*>β they will not appear in normal generations.
βοΈ Variants
| Folder | Bits | Group Size | Description |
|---|---|---|---|
mlx/g128/ |
int4 | 128 | Smallest & fastest (lightest memory use) |
mlx/g64/ |
int4 | 64 | Balanced: slightly slower, more stable |
mlx/int8/ |
int8 | β | Closest to fp16 precision, best coherence |
π Quickstart
Run directly from Hugging Face
python -m mlx_lm.generate \
--model hf://Daizee/Dirty-Calla-4B-mlx/mlx/g64 \
--prompt "Describe a rainy city from the perspective of a poet." \
--max-tokens 150 --temp 0.4
- Downloads last month
- 50
Hardware compatibility
Log In to add your hardware
Quantized
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # if on a CUDA device, also pip install mlx[cuda] # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("Daizee/Dirty-Calla-4B-mlx") prompt = "Once upon a time in" text = generate(model, tokenizer, prompt=prompt, verbose=True)