--- license: other pipeline_tag: text-generation library_name: gguf language: - en base_model: Qwen/Qwen3-14B base_model_relation: quantized tags: - gguf - qwen3 - pentesting - security - lora - sft --- # Zero Stack - Qwen3-14B (GGUF, Q5_K_M) Qwen3-14B fine-tuned on an offensive-security SFT dataset (1,226 rows). Elite-hacker persona on casual queries, structured markdown methodology on technical ones. Thinking mode enabled by default (Qwen3-14B base behavior). ## Files - `qwen3-14b.Q5_K_M.gguf` - quantized weights (~9.8 GB) - `Modelfile` - Ollama template with correct ChatML stop tokens + Zero Stack system prompt ## Run with Ollama ```bash ollama create zerostack-14b -f Modelfile ollama run zerostack-14b ``` ## Run with llama.cpp ```bash ./llama-cli -m qwen3-14b.Q5_K_M.gguf -p "hello" ``` ## Training - Base: `Qwen3-14B` - Method: LoRA (r=32), 3 epochs, Unsloth - Max sequence length: 2560 - Dataset: SFT_GENERALIST (1,226 rows, ChatML) ## Intended Use Authorized security testing, CTF practice, red-team research, and security education. Targeted at practitioners who already know what they're doing and want structured methodology and command recall. ## Limitations & Risks - May hallucinate specific CVE IDs, tool flags, or payload syntax - verify against primary sources before running. - No safety guardrails against misuse. Do not use against systems you don't own or have explicit written authorization to test. - Thinking mode is on by default - responses may be slower and include reasoning traces. Disable in Modelfile if you want faster, terser output. - Trained on English data only; non-English performance is not evaluated. - 16 GB VRAM note: GGUF export uses CPU offloading to avoid LoRA merge corruption. If you retrain/re-export, verify `maximum_memory_usage=0.5` in `export_gguf.py`. ## License / Use For authorized security testing, research, and educational use only. Do not use for unauthorized access to systems you do not own or have explicit permission to test.