Spaces:
Running
Running
| # Non-thinking Modelfile for the fine-tuned planner. | |
| # | |
| # The published GGUF (ricalanis/scrubdata-qwen3-4b-gguf) ships Qwen3's full | |
| # thinking+tools template, which makes Ollama burn the token budget "thinking" and | |
| # return empty/garbage for our task. We fine-tuned the NON-thinking Instruct model to | |
| # emit the JSON plan directly, so override the template to match training. | |
| # | |
| # ALSO: the current Unsloth (2026.6.1) Q4_K_M GGUF export is CORRUPTED for this model | |
| # (degenerates into <tool_call> loops). Use Q8_0 — it works. Export with q8_0. | |
| # | |
| # ollama pull hf.co/ricalanis/scrubdata-qwen3-4b-v6-q8:Q8_0 | |
| # ollama create scrubdata-ft -f notebooks/Modelfile | |
| # uv run eval/run_finetuned.py --model scrubdata-ft --n 40 | |
| # CONSTRAINED DECODING REQUIRED ON LONG PROMPTS: on full planning prompts the Q8 GGUF's | |
| # first token can degenerate into <tool_call> loops (Qwen3 tool-calling prior). Use | |
| # format=json in the Ollama API call (grammar-constrained decoding), or under | |
| # transformers suppress_tokens=[151657, 151658]. See eval/capture_plan_local.py and the | |
| # model card's Integrity section (the GGUF itself was re-exported 2026-06-12; sha256s | |
| # recorded there). | |
| # v6 = mixA (more real paired data): hospital repair 0.475/0.185 (v4/v5 was 0/0.42) | |
| FROM hf.co/ricalanis/scrubdata-qwen3-4b-v6-q8:Q8_0 | |
| TEMPLATE """{{- if .System }}<|im_start|>system | |
| {{ .System }}<|im_end|> | |
| {{ end }}{{- range .Messages }}<|im_start|>{{ .Role }} | |
| {{ .Content }}<|im_end|> | |
| {{ end }}<|im_start|>assistant | |
| """ | |
| PARAMETER stop "<|im_end|>" | |
| PARAMETER temperature 0 | |