Instructions to use masafy/masafee-ctf-7b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries
PEFT
How to use masafy/masafee-ctf-7b with PEFT:
```
Task type is invalid.
```
Notebooks
Google Colab
Kaggle
Local Apps Settings

How to use masafy/masafee-ctf-7b with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf masafy/masafee-ctf-7b:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf masafy/masafee-ctf-7b:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf masafy/masafee-ctf-7b:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf masafy/masafee-ctf-7b:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf masafy/masafee-ctf-7b:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf masafy/masafee-ctf-7b:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf masafy/masafee-ctf-7b:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf masafy/masafee-ctf-7b:Q4_K_M

Use Docker

docker model run hf.co/masafy/masafee-ctf-7b:Q4_K_M

LM Studio
Jan

vLLM

How to use masafy/masafee-ctf-7b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "masafy/masafee-ctf-7b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "masafy/masafee-ctf-7b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/masafy/masafee-ctf-7b:Q4_K_M

Ollama
How to use masafy/masafee-ctf-7b with Ollama:
```
ollama run hf.co/masafy/masafee-ctf-7b:Q4_K_M
```

Unsloth Studio

How to use masafy/masafee-ctf-7b with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for masafy/masafee-ctf-7b to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for masafy/masafee-ctf-7b to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for masafy/masafee-ctf-7b to start chatting

How to use masafy/masafee-ctf-7b with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf masafy/masafee-ctf-7b:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "masafy/masafee-ctf-7b:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use masafy/masafee-ctf-7b with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf masafy/masafee-ctf-7b:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default masafy/masafee-ctf-7b:Q4_K_M

Run Hermes

hermes

Atomic Chat new

OpenClaw new

How to use masafy/masafee-ctf-7b with OpenClaw:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf masafy/masafee-ctf-7b:Q4_K_M

Configure OpenClaw

# Install OpenClaw:
npm install -g openclaw@latest
# Register the local server and set it as the default model:
openclaw onboard --non-interactive --mode local \
  --auth-choice custom-api-key \
  --custom-base-url http://127.0.0.1:8080/v1 \
  --custom-model-id "masafy/masafee-ctf-7b:Q4_K_M" \
  --custom-provider-id llama-cpp \
  --custom-compatibility openai \
  --custom-text-input \
  --accept-risk \
  --skip-health

Run OpenClaw

openclaw agent --local --agent main --message "Hello from Hugging Face"

Docker Model Runner
How to use masafy/masafee-ctf-7b with Docker Model Runner:
```
docker model run hf.co/masafy/masafee-ctf-7b:Q4_K_M
```

Lemonade

How to use masafy/masafee-ctf-7b with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull masafy/masafee-ctf-7b:Q4_K_M

Run and chat with the model

lemonade run user.masafee-ctf-7b-Q4_K_M

List all available models

lemonade list

Masafee CTF 7B

QLoRA fine-tune of Qwen 2.5 Coder 7B Instruct on CTFtime writeups — trained entirely on a single NVIDIA GeForce RTX 3060 12 GB in 12 h 17 m of wall-clock time, with no cloud compute.

📄 Paper: English (5 pp.) · 日本語 (6 pp.) · evaluation report

This is part of the "Masafee" personal GPU research series — the second release after masafee-lora (a Stable Diffusion LoRA of the same name).

Model details


Base model	`Qwen/Qwen2.5-Coder-7B-Instruct`
Method	QLoRA (r=32, α=64, 4-bit) via unsloth
Training data	`justinwangx/CTFtime` — 18,013 writeup chunks → ~5,200 × 2048-token packed sequences (10.6M tokens)
Strategy	Continued pretraining on raw writeup text (no instruction-format conversion)
Learning rate	2e-4, cosine schedule, 10 warmup steps
Epochs	2
Hardware	NVIDIA GeForce RTX 3060 12 GB
Wall time	12 h 17 m
Final train loss	1.62
Final eval loss	1.644

Files in this repository

adapter/ — LoRA adapter for use with PEFT
- adapter_config.json, adapter_model.safetensors
- tokenizer.json, tokenizer_config.json, chat_template.jinja
masafee-ctf-7b.q4_k_m.gguf — single-file Q4_K_M GGUF (4.4 GB) for Ollama / llama.cpp

Usage

With Transformers + PEFT

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-Coder-7B-Instruct",
    torch_dtype=torch.bfloat16,
).to("cuda")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-Coder-7B-Instruct")
model = PeftModel.from_pretrained(base, "masafy/masafee-ctf-7b", subfolder="adapter")

prompt = "How would you approach a CTF challenge that gives you an ELF binary with a gets() call?"
msgs = [{"role": "user", "content": prompt}]
ids = tokenizer.apply_chat_template(msgs, return_tensors="pt", add_generation_prompt=True).to("cuda")
out = model.generate(ids, max_new_tokens=400, do_sample=False)
print(tokenizer.decode(out[0][ids.shape[1]:], skip_special_tokens=True))

With Ollama (GGUF)

huggingface-cli download masafy/masafee-ctf-7b masafee-ctf-7b.q4_k_m.gguf

cat > Modelfile <<'MFILE'
FROM ./masafee-ctf-7b.q4_k_m.gguf
TEMPLATE """{{ if .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}{{ range .Messages }}<|im_start|>{{ .Role }}
{{ .Content }}<|im_end|>
{{ end }}<|im_start|>assistant
"""
PARAMETER stop "<|im_start|>"
PARAMETER stop "<|im_end|>"
PARAMETER temperature 0.7
PARAMETER num_ctx 4096
MFILE

ollama create masafee-ctf-7b -f Modelfile
ollama run masafee-ctf-7b

Evaluation summary

Full report: GitHub EVALUATION.md · EVALUATION_ja.md

Benchmark	Base Qwen	masafee-ctf-7b	Foundation-Sec-8B
CyberMetric-500 accuracy	86.20%	84.00%	82.60%
NYU CTF subset Pass@1 (30 Q.)	13.3%	0.0%	6.7%
Hedging phrases (sum / 30)	—	7	77

All three CyberMetric numbers fall within the 95% CI band (±3.1 pp at n=500). NYU CTF Bench was evaluated under a single-shot, non-agentic protocol which is strictly weaker than the official benchmark. Stylistic divergence from Foundation-Sec-8B (11× hedging ratio) reflects their respective training-data domains (CTF writeups vs SOC analysis), not a quality ranking.

Limitations

Style overfitting: continued pretraining on raw writeup text causes the model to emit writeup-formatted narrative that can consume the output budget before producing a final answer.
Hallucinated writeups: on out-of-distribution CTF prompts, the model occasionally generates plausible-but-wrong writeups for unrelated problems.
No agentic capability gain over the base model — for solving real CTF challenges, use a larger model or an agent harness.

The model is intended as a CTF-style explainer and demonstrator of QLoRA on a consumer GPU, not as a CTF-solving agent.

License

LoRA adapter weights and GGUF in this repository: research and personal use only. These are derivative of CTFtime writeups whose copyright belongs to individual contributors; redistribution or commercial use is not permitted without explicit permission from those original authors.
Code / scripts / documentation / paper in the GitHub repository: MIT.
Base model (Qwen/Qwen2.5-Coder-7B-Instruct): Apache 2.0.

Citation

@software{suzuki_masafee_ctf_7b_2026,
  author       = {Suzuki, Masato},
  title        = {{Masafee CTF 7B: QLoRA Fine-Tuning of a 7B Code Model on
                   CTF Writeups for Stylistic and Knowledge Adaptation}},
  year         = {2026},
  version      = {v1.1.2},
  doi          = {10.5281/zenodo.20413080},
  url          = {https://doi.org/10.5281/zenodo.20413080},
  orcid        = {0009-0000-7977-2756}
}

Made by masafykun · masafy.org · ORCID · 🐾