Qwen3-g023-tiny-v2-FT-Q8_0 - GRPO Finetuned Q8_0 GGUF Export
https://huggingface.co/g023/qwen3-tiny-v2-finetuned/
Q8_0 GGUF export of a GRPO finetuned Qwen3 model to achieve improved reasoning and reduced repetition. Original SRC Model: https://huggingface.co/g023/qwen3-tiny-v2
THIS IS A WIP (WORK IN PROGRESS)
Files
Qwen3-g023-tiny-v2-FT-Q8_0.gguf: Q8_0 GGUF model (~1.81 GB)Modelfile: Ollama template + tested default sampling settingsparams_best.json: Best sampled parameters from automated sweepsweep_results.json: Full sweep results and per-test outcomes
Tested Best Parameters (Default in Modelfile)
temperature: 0.65top_p: 0.9top_k: 20min_p: 0.0repeat_penalty: 1.05presence_penalty: 0.1frequency_penalty: 0.1num_ctx: 40000
Usage (Ollama)
ollama create qwen3-g023-tiny-v2-FT-Q8_0 -f Modelfile
ollama run qwen3-g023-tiny-v2-FT-Q8_0
# thinking on
ollama run qwen3-g023-tiny-v2-FT-Q8_0 --think "Explain why the sky is blue"
# thinking off
ollama run qwen3-g023-tiny-v2-FT-Q8_0 --think=false "Explain why the sky is blue"
or pull from huggingface directly to ollama:
ollama run hf.co/g023/qwen3-tiny-v2-finetuned:Q8_0
Notes
- Template is the Qwen3-compatible template with think/no_think handling.
- If you want stricter non-thinking behavior, compare alternatives in
sweep_results.json.
- Downloads last month
- 49
Hardware compatibility
Log In to add your hardware
8-bit