How to use from
llama.cppInstall from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf redstackio/qwen3-14b-redstack-v1:Q5_K_M# Run inference directly in the terminal:
llama-cli -hf redstackio/qwen3-14b-redstack-v1:Q5_K_MUse pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf redstackio/qwen3-14b-redstack-v1:Q5_K_M# Run inference directly in the terminal:
./llama-cli -hf redstackio/qwen3-14b-redstack-v1:Q5_K_MBuild from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf redstackio/qwen3-14b-redstack-v1:Q5_K_M# Run inference directly in the terminal:
./build/bin/llama-cli -hf redstackio/qwen3-14b-redstack-v1:Q5_K_MUse Docker
docker model run hf.co/redstackio/qwen3-14b-redstack-v1:Q5_K_MQuick Links
Zero Stack - Qwen3-14B (GGUF, Q5_K_M)
Qwen3-14B fine-tuned on an offensive-security SFT dataset (1,226 rows). Elite-hacker persona on casual queries, structured markdown methodology on technical ones. Thinking mode enabled by default (Qwen3-14B base behavior).
Files
qwen3-14b.Q5_K_M.gguf- quantized weights (~9.8 GB)Modelfile- Ollama template with correct ChatML stop tokens + Zero Stack system prompt
Run with Ollama
ollama create zerostack-14b -f Modelfile
ollama run zerostack-14b
Run with llama.cpp
./llama-cli -m qwen3-14b.Q5_K_M.gguf -p "hello"
Training
- Base:
Qwen3-14B - Method: LoRA (r=32), 3 epochs, Unsloth
- Max sequence length: 2560
- Dataset: SFT_GENERALIST (1,226 rows, ChatML)
Intended Use
Authorized security testing, CTF practice, red-team research, and security education. Targeted at practitioners who already know what they're doing and want structured methodology and command recall.
Limitations & Risks
- May hallucinate specific CVE IDs, tool flags, or payload syntax - verify against primary sources before running.
- No safety guardrails against misuse. Do not use against systems you don't own or have explicit written authorization to test.
- Thinking mode is on by default - responses may be slower and include reasoning traces. Disable in Modelfile if you want faster, terser output.
- Trained on English data only; non-English performance is not evaluated.
- 16 GB VRAM note: GGUF export uses CPU offloading to avoid LoRA merge corruption. If you retrain/re-export, verify
maximum_memory_usage=0.5inexport_gguf.py.
License / Use
For authorized security testing, research, and educational use only. Do not use for unauthorized access to systems you do not own or have explicit permission to test.
- Downloads last month
- 99
Hardware compatibility
Log In to add your hardware
5-bit
Install from brew
# Start a local OpenAI-compatible server with a web UI: llama-server -hf redstackio/qwen3-14b-redstack-v1:Q5_K_M# Run inference directly in the terminal: llama-cli -hf redstackio/qwen3-14b-redstack-v1:Q5_K_M