# Non-thinking Modelfile for the fine-tuned planner. # # The published GGUF (ricalanis/scrubdata-qwen3-4b-gguf) ships Qwen3's full # thinking+tools template, which makes Ollama burn the token budget "thinking" and # return empty/garbage for our task. We fine-tuned the NON-thinking Instruct model to # emit the JSON plan directly, so override the template to match training. # # ALSO: the current Unsloth (2026.6.1) Q4_K_M GGUF export is CORRUPTED for this model # (degenerates into loops). Use Q8_0 — it works. Export with q8_0. # # ollama pull hf.co/ricalanis/scrubdata-qwen3-4b-v6-q8:Q8_0 # ollama create scrubdata-ft -f notebooks/Modelfile # uv run eval/run_finetuned.py --model scrubdata-ft --n 40 # CONSTRAINED DECODING REQUIRED ON LONG PROMPTS: on full planning prompts the Q8 GGUF's # first token can degenerate into loops (Qwen3 tool-calling prior). Use # format=json in the Ollama API call (grammar-constrained decoding), or under # transformers suppress_tokens=[151657, 151658]. See eval/capture_plan_local.py and the # model card's Integrity section (the GGUF itself was re-exported 2026-06-12; sha256s # recorded there). # v6 = mixA (more real paired data): hospital repair 0.475/0.185 (v4/v5 was 0/0.42) FROM hf.co/ricalanis/scrubdata-qwen3-4b-v6-q8:Q8_0 TEMPLATE """{{- if .System }}<|im_start|>system {{ .System }}<|im_end|> {{ end }}{{- range .Messages }}<|im_start|>{{ .Role }} {{ .Content }}<|im_end|> {{ end }}<|im_start|>assistant """ PARAMETER stop "<|im_end|>" PARAMETER temperature 0