How to use from
llama.cpp
Install from brew
brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf redstackio/qwen3-14b-redstack-v1:Q5_K_M
# Run inference directly in the terminal:
llama-cli -hf redstackio/qwen3-14b-redstack-v1:Q5_K_M
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf redstackio/qwen3-14b-redstack-v1:Q5_K_M
# Run inference directly in the terminal:
llama-cli -hf redstackio/qwen3-14b-redstack-v1:Q5_K_M
Use pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf redstackio/qwen3-14b-redstack-v1:Q5_K_M
# Run inference directly in the terminal:
./llama-cli -hf redstackio/qwen3-14b-redstack-v1:Q5_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf redstackio/qwen3-14b-redstack-v1:Q5_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf redstackio/qwen3-14b-redstack-v1:Q5_K_M
Use Docker
docker model run hf.co/redstackio/qwen3-14b-redstack-v1:Q5_K_M
Quick Links

Zero Stack - Qwen3-14B (GGUF, Q5_K_M)

Qwen3-14B fine-tuned on an offensive-security SFT dataset (1,226 rows). Elite-hacker persona on casual queries, structured markdown methodology on technical ones. Thinking mode enabled by default (Qwen3-14B base behavior).

Files

  • qwen3-14b.Q5_K_M.gguf - quantized weights (~9.8 GB)
  • Modelfile - Ollama template with correct ChatML stop tokens + Zero Stack system prompt

Run with Ollama

ollama create zerostack-14b -f Modelfile
ollama run zerostack-14b

Run with llama.cpp

./llama-cli -m qwen3-14b.Q5_K_M.gguf -p "hello"

Training

  • Base: Qwen3-14B
  • Method: LoRA (r=32), 3 epochs, Unsloth
  • Max sequence length: 2560
  • Dataset: SFT_GENERALIST (1,226 rows, ChatML)

Intended Use

Authorized security testing, CTF practice, red-team research, and security education. Targeted at practitioners who already know what they're doing and want structured methodology and command recall.

Limitations & Risks

  • May hallucinate specific CVE IDs, tool flags, or payload syntax - verify against primary sources before running.
  • No safety guardrails against misuse. Do not use against systems you don't own or have explicit written authorization to test.
  • Thinking mode is on by default - responses may be slower and include reasoning traces. Disable in Modelfile if you want faster, terser output.
  • Trained on English data only; non-English performance is not evaluated.
  • 16 GB VRAM note: GGUF export uses CPU offloading to avoid LoRA merge corruption. If you retrain/re-export, verify maximum_memory_usage=0.5 in export_gguf.py.

License / Use

For authorized security testing, research, and educational use only. Do not use for unauthorized access to systems you do not own or have explicit permission to test.

Downloads last month
99
GGUF
Model size
15B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

5-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for redstackio/qwen3-14b-redstack-v1

Finetuned
Qwen/Qwen3-14B
Quantized
(173)
this model