Text Generation
MLX
Safetensors
German
English
qwen3
lora
fine-tuned
food-tracking
german
swiss-german
conversational
4-bit precision
Instructions to use Juhuu/slowestlooser-v5-4bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use Juhuu/slowestlooser-v5-4bit with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("Juhuu/slowestlooser-v5-4bit") prompt = "Write a story about Einstein" messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
- Pi new
How to use Juhuu/slowestlooser-v5-4bit with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "Juhuu/slowestlooser-v5-4bit"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "Juhuu/slowestlooser-v5-4bit" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use Juhuu/slowestlooser-v5-4bit with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "Juhuu/slowestlooser-v5-4bit"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default Juhuu/slowestlooser-v5-4bit
Run Hermes
hermes
- MLX LM
How to use Juhuu/slowestlooser-v5-4bit with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Interactive chat REPL mlx_lm.chat --model "Juhuu/slowestlooser-v5-4bit"
Run an OpenAI-compatible server
# Install MLX LM uv tool install mlx-lm # Start the server mlx_lm.server --model "Juhuu/slowestlooser-v5-4bit" # Calling the OpenAI-compatible server with curl curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Juhuu/slowestlooser-v5-4bit", "messages": [ {"role": "user", "content": "Hello"} ] }'
SlowestLooser FoodJSON Unified v5
QLoRA fine-tune of mlx-community/Qwen3-1.7B-4bit for the SlowestLooser
on-device calorie-tracking app. Handles food, drink, and activity queries
under one unified JSON schema.
What this model does
The model is a pure RAG → JSON transformer: given a user query plus a pre-resolved DB block, it copies values verbatim into the output schema. The app pre-computes grams / volume_ml / minutes / calories iOS-side and passes them in the prompt; the model never invents numbers.
Output schema
{
"name": "<user input verbatim>",
"items": [
{
"type": "food" | "drink" | "activity",
"name": "...",
"grams": int, "volume_ml": int, "minutes": int,
"calories": int,
"protein": float, "carbs": float, "sugar": float,
"fat": float, "saturated_fat": float, "salt": float
}
]
}
- food:
grams > 0,volume_ml = 0,minutes = 0 - drink:
volume_ml > 0,grams = 0,minutes = 0 - activity:
minutes > 0, all macros = 0
Training
| Setting | Value |
|---|---|
| Method | QLoRA via mlx-lm on M4 Pro |
| Iterations | 2000 |
| Batch size | 4 |
| LR schedule | linear warmup 0 → 1e-4 over 50 iters, cosine decay → 1e-5 by iter 2000 |
| Optimizer | AdamW |
| LoRA rank | 32 |
| LoRA scale | 20.0 |
| LoRA target layers | last 16 attention layers |
| Max seq length | 2048 |
| Trainable params | 19.92M (1.158% of base) |
Loss
| Iter | Val loss |
|---|---|
| 1 | 2.518 |
| 500 | 0.041 |
| 1000 | 0.043 |
| 1500 | 0.047 |
| 2000 | 0.040 |
Eval (holdout, 475 records)
| Metric | Value |
|---|---|
| parse_rate | 100.0% |
| top1_acc | 100.0% |
| quantity_acc | 100.0% |
| multi_recall | 100.0% |
| avg latency | 1.63s |
Predecessors
- v1 — food-only fine-tune (Qwen3-1.7B-4bit), 1000 iters, val 0.053. Superseded by v2.
- v2 — combined food + activity with separate schemas (
Juhuu/slowestlooser-v2-4bit), 1500 iters, val 0.046. Currently in iOS production. - v3 — abandoned (food-side fixes regressed activity quality at fixed LR).
- v4 — same unified schema as v5 but flat LR 2e-4 — gradient divergence at iter 1475 (train loss 0.5 → 2.1 in 25 iters), best saved snapshot at iter 1250 unshippable (eval parse_rate 42.3%).
- v5 — cosine LR + warmup fixed the divergence. Production candidate.
Caveats
- Holdout is drawn from the same distribution as training. 100% on holdout
does NOT mean 100% in production. The real test is the iOS-side
QualitySpec.allmatrix sweep (60 hand-crafted prompts) on real device. - The system prompt and RAG-block format must be byte-equal between training
and runtime. Source of truth:
prompt_v4.py::SYSTEM_V4in the slowestlooser-finetune repo.
Usage
The model is used by SlowestLooser iOS app via mlx-swift-lm. iOS-side
pipeline:
user query → IngredientSplitter → per-part DB lookup (curated catalog ⊕
OFF VectorDB) → UnifiedRAGBlockBuilder → model.generate → JSON parse
- Downloads last month
- 17
Model size
0.3B params
Tensor type
BF16
·
U32 ·
Hardware compatibility
Log In to add your hardware
4-bit