Instructions to use athulkrishnan/BountyHound-Coder-14B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use athulkrishnan/BountyHound-Coder-14B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="athulkrishnan/BountyHound-Coder-14B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("athulkrishnan/BountyHound-Coder-14B")
model = AutoModelForCausalLM.from_pretrained("athulkrishnan/BountyHound-Coder-14B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

PEFT
How to use athulkrishnan/BountyHound-Coder-14B with PEFT:
```
Task type is invalid.
```

llama-cpp-python

How to use athulkrishnan/BountyHound-Coder-14B with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="athulkrishnan/BountyHound-Coder-14B",
	filename="gguf/BountyHound-Coder-14B-Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use athulkrishnan/BountyHound-Coder-14B with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf athulkrishnan/BountyHound-Coder-14B:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf athulkrishnan/BountyHound-Coder-14B:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf athulkrishnan/BountyHound-Coder-14B:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf athulkrishnan/BountyHound-Coder-14B:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf athulkrishnan/BountyHound-Coder-14B:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf athulkrishnan/BountyHound-Coder-14B:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf athulkrishnan/BountyHound-Coder-14B:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf athulkrishnan/BountyHound-Coder-14B:Q4_K_M

Use Docker

docker model run hf.co/athulkrishnan/BountyHound-Coder-14B:Q4_K_M

LM Studio
Jan

vLLM

How to use athulkrishnan/BountyHound-Coder-14B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "athulkrishnan/BountyHound-Coder-14B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "athulkrishnan/BountyHound-Coder-14B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/athulkrishnan/BountyHound-Coder-14B:Q4_K_M

SGLang

How to use athulkrishnan/BountyHound-Coder-14B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "athulkrishnan/BountyHound-Coder-14B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "athulkrishnan/BountyHound-Coder-14B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "athulkrishnan/BountyHound-Coder-14B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "athulkrishnan/BountyHound-Coder-14B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use athulkrishnan/BountyHound-Coder-14B with Ollama:
```
ollama run hf.co/athulkrishnan/BountyHound-Coder-14B:Q4_K_M
```

Unsloth Studio

How to use athulkrishnan/BountyHound-Coder-14B with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for athulkrishnan/BountyHound-Coder-14B to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for athulkrishnan/BountyHound-Coder-14B to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for athulkrishnan/BountyHound-Coder-14B to start chatting

How to use athulkrishnan/BountyHound-Coder-14B with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf athulkrishnan/BountyHound-Coder-14B:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "athulkrishnan/BountyHound-Coder-14B:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use athulkrishnan/BountyHound-Coder-14B with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf athulkrishnan/BountyHound-Coder-14B:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default athulkrishnan/BountyHound-Coder-14B:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use athulkrishnan/BountyHound-Coder-14B with Docker Model Runner:
```
docker model run hf.co/athulkrishnan/BountyHound-Coder-14B:Q4_K_M
```

Lemonade

How to use athulkrishnan/BountyHound-Coder-14B with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull athulkrishnan/BountyHound-Coder-14B:Q4_K_M

Run and chat with the model

lemonade run user.BountyHound-Coder-14B-Q4_K_M

List all available models

lemonade list

BountyHound-Coder-14B / README.md

athulkrishnan

Professional cleanup: remove emojis/symbols (keep title), ASCII-only body

22aa2ea verified about 13 hours ago

preview code

raw

history blame contribute delete

12.4 kB

	---
	license: apache-2.0
	license_link: https://huggingface.co/Qwen/Qwen2.5-Coder-14B-Instruct/blob/main/LICENSE
	base_model: Qwen/Qwen2.5-Coder-14B-Instruct
	base_model_relation: finetune
	library_name: transformers
	pipeline_tag: text-generation
	language:
	- en
	tags:
	- security
	- cybersecurity
	- bug-bounty
	- vulnerability-triage
	- vulnerability-management
	- recon
	- offensive-security
	- blue-team
	- qwen2
	- code
	- conversational
	- unsloth
	- trl
	- peft
	- lora
	- qlora
	- text-generation-inference
	extra_gated_heading: "Request access to BountyHound-Coder-14B"
	extra_gated_button_content: "Request access"
	extra_gated_prompt: >-
	BountyHound is released for AUTHORIZED security research and defensive triage only.
	By requesting access you confirm that you will use this model ONLY against assets you
	are explicitly authorized to test (in-scope bug-bounty programs, systems you own, or
	written penetration-test/red-team engagements), that you will follow coordinated /
	responsible disclosure, and that you accept the "as-is, no warranty" terms in the
	Disclaimer section of this card. Access is granted at the maintainer's discretion.
	extra_gated_fields:
	Full name: text
	Affiliation or handle: text
	Primary use case: text
	Country: country
	I will only use this model on systems I am explicitly authorized to test: checkbox
	I will follow responsible / coordinated disclosure: checkbox
	I accept the as-is, no-warranty terms in the model card: checkbox
	---

	# 🐕‍🦺 BountyHound-Coder-14B

	> A gated, security-specialised co-pilot for bug-bounty triage and recon prioritisation,
	> fine-tuned from Qwen2.5-Coder-14B-Instruct. Built to run locally on a single 16 GB GPU.

	BountyHound is not an autonomous hacking agent. It is a decision-support model that
	helps an authorized researcher answer two questions fast and well:

	1. "Is this finding real and worth submitting?" — submit/kill triage, impact reasoning,
	false-positive and out-of-scope filtering.
	2. "Where should I look first?" — recon attack-surface ranking: a tech stack in,
	a prioritised, rationale-backed vulnerability-class hit-list out.

	It is deliberately terse and impact-first: it kills weak findings, asks for proof of
	exploitation, and avoids "could potentially" filler.

	> Gated and authorized-use-only. Access requires approval. See Intended use,
	> Bias, Risks & Limitations, and the Disclaimer before requesting.

	---

	## Model information

	\| \| \|
	\|---\|---\|
	\| Developer \| `athulkrishnan` (independent) \|
	\| Model type \| Auto-regressive transformer (decoder-only), instruction-tuned \|
	\| Base model \| [`Qwen/Qwen2.5-Coder-14B-Instruct`](https://huggingface.co/Qwen/Qwen2.5-Coder-14B-Instruct) (~14.7B params, 48 layers) \|
	\| Fine-tune method \| QLoRA SFT (4-bit NF4 base, LoRA r=32) via [Unsloth](https://github.com/unslothai/unsloth) + [TRL](https://github.com/huggingface/trl) \|
	\| Specialisation \| Bug-bounty finding triage/validation · recon attack-surface ranking \|
	\| Language \| English \|
	\| Context length \| 32,768 native (up to 131K with YaRN); trained at 2,048 \|
	\| Precision / formats \| Merged BF16 safetensors · Q4_K_M GGUF in [`gguf/`](./gguf) \|
	\| License \| Apache-2.0 (inherited from the Qwen base) \|
	\| Status \| Static, offline fine-tune · v1 (see [Versions](#versions)) \|

	---

	## Intended use

	### Intended use cases
	- Finding triage & validation — decide submit vs. kill, sanity-check severity, reason
	about real-world impact, and cut duplicate / informational / out-of-scope noise before a
	human writes a report.
	- Recon prioritisation — turn a fingerprinted tech stack or attack surface into a ranked
	hit-list of vulnerability classes worth testing first, with one-line rationale.
	- Methodology assistant — explain bug classes, CWE mappings, and report framing to support
	authorized learning and assessment work.

	### Downstream use
	- A local triage/ranking step inside an authorized bug-bounty or pentest workflow
	(human-in-the-loop), e.g. pre-filtering scanner output or drafting impact statements.
	- A base for further domain fine-tuning or for pairing with retrieval (RAG) over fresh
	CVEs / current program scope.

	### Out-of-scope and prohibited use
	- Testing, scanning, or exploiting systems you are not explicitly authorized to assess.
	- Autonomous attack execution without human review — BountyHound is a co-pilot, not an agent.
	- Generating malware, phishing, or weaponised exploit payloads for unauthorized use.
	- Treating outputs as ground truth, or as legal/compliance advice. Always validate.
	- Any use that violates applicable law or platform/program rules.

	---

	## How to get started

	### Requirements
	`transformers >= 4.40` (developed on 4.56.2), `torch >= 2.3`, and `accelerate`.
	The merged model is BF16 (~29 GB); for a single 16 GB GPU use the Q4_K_M GGUF with
	llama.cpp / Ollama, or load in 4-bit with `bitsandbytes`.

	### Transformers

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	repo = "athulkrishnan/BountyHound-Coder-14B"
	tok = AutoTokenizer.from_pretrained(repo)
	model = AutoModelForCausalLM.from_pretrained(repo, torch_dtype="auto", device_map="auto")

	SYSTEM = (
	"You are a bug-bounty co-pilot for an authorized security researcher. You assist ONLY "
	"with testing that is in-scope and authorized on bug-bounty programs. You are sharp, "
	"terse, and impact-first: you kill weak findings, prove real exploitation, and never pad "
	"reports with 'could potentially'. Your specialties are finding triage/validation and "
	"recon attack-surface ranking."
	)
	messages = [
	{"role": "system", "content": SYSTEM},
	{"role": "user", "content":
	"Triage: reflected XSS on a marketing page, unauthenticated, no session context. "
	"Submit or kill? One line + why."},
	]
	ids = tok.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
	out = model.generate(ids, max_new_tokens=256, temperature=0.3, top_p=0.9)
	print(tok.decode(out[0][ids.shape[1]:], skip_special_tokens=True))
	```

	### Ollama / llama.cpp (GGUF)

	Download `gguf/BountyHound-Coder-14B-Q4_K_M.gguf`, then create a `Modelfile`:

	```dockerfile
	FROM ./BountyHound-Coder-14B-Q4_K_M.gguf
	TEMPLATE """{{- if .System }}<\|im_start\|>system
	{{ .System }}<\|im_end\|>
	{{ end }}{{- range .Messages }}<\|im_start\|>{{ .Role }}
	{{ .Content }}<\|im_end\|>
	{{ end }}<\|im_start\|>assistant
	"""
	SYSTEM """You are a bug-bounty co-pilot for an authorized security researcher. You assist ONLY with testing that is in-scope and authorized on bug-bounty programs. You are sharp, terse, and impact-first: you kill weak findings, prove real exploitation, and never pad reports with 'could potentially'. Your specialties are finding triage/validation and recon attack-surface ranking."""
	PARAMETER temperature 0.3
	PARAMETER top_p 0.9
	PARAMETER stop "<\|im_start\|>"
	PARAMETER stop "<\|im_end\|>"
	```

	```bash
	ollama create bountyhound -f Modelfile
	ollama run bountyhound "Rank the attack surface for a Spring Boot + GraphQL + S3 stack."
	```

	### Prompt format
	Qwen2.5 ChatML (`<\|im_start\|>role … <\|im_end\|>`) with the security system prompt above.
	Recommended decoding: `temperature 0.3`, `top_p 0.9`, `repeat_penalty 1.05`.

	---

	## Training

	### Training data
	A weighted instruction mix biased toward the two target skills (≈6.2K curated conversations):

	\| Source \| Purpose \|
	\|---\|---\|
	\| HackerOne disclosed reports (public) \| finding disposition + severity-triage signal \|
	\| Curated bug-bounty methodology & triage heuristics \| submit/kill discipline, validation gates, anti-patterns \|
	\| Recon playbook / attack-surface examples \| tech-stack to ranked vulnerability classes \|
	\| Public detection-template patterns \| low-false-positive authoring style \|
	\| General-security instruction data (~13%) \| rehearsal to limit catastrophic forgetting \|

	> **No customer data, private program scope, credentials, or other non-public material is
	> included in the training set.** Only public or self-authored content was used.

	### Training procedure
	QLoRA supervised fine-tuning, loss computed on assistant turns only.

	\| Hyperparameter \| Value \|
	\|---\|---\|
	\| Quantisation \| 4-bit NF4 (base), BF16 compute \|
	\| LoRA \| r=32, α=32, dropout=0, all linear projections \|
	\| Optimiser \| paged AdamW 8-bit, weight decay 0.01 \|
	\| LR / schedule \| 2e-4, cosine, 3% warmup \|
	\| Epochs / eff. batch \| 2 / 8 (micro-batch 1 × grad-accum 8) \|
	\| Max sequence length \| 2,048 \|
	\| Hardware \| 1× NVIDIA RTX 4070 Ti SUPER (16 GB) \|
	\| Frameworks \| Unsloth · TRL 0.22 · Transformers 4.56 · PyTorch 2.9 \|

	### Evaluation
	v1 is scored with a deterministic, rubric-based held-out harness (no LLM judge): each item is
	decision- or rubric-scorable across triage (submit/kill accuracy), recon ranking
	(expected-class recall), and rubric categories (report/nuclei/payload/coding), comparing
	the tune against the `Qwen2.5-Coder-14B` base. The ship gate requires improvement on the
	two priority skills (triage, ranking) with no material regression on general coding
	(guarding against catastrophic forgetting). A full quantitative scorecard is published
	alongside v2; treat v1 as a capable assistant, not a benchmarked SOTA system.

	---

	## Bias, risks, and limitations

	- Not a vulnerability discoverer. A 14B local model assists triage and prioritisation;
	it does not autonomously find or weaponise novel bugs, and can miss context a human or a
	larger system would catch.
	- Can be confidently wrong. It may over- or under-rate severity, hallucinate a CWE/CVE,
	or mis-scope a finding. Every output must be validated before acting or reporting.
	- Frozen knowledge. Trained on a static snapshot — it will not know the newest CVEs,
	techniques, or your current program scope. Pair with retrieval for facts.
	- Domain bias. Trained heavily on web-app / HackerOne-style findings; it is weaker on
	niche stacks, hardware, embedded, and non-web targets.
	- Dual-use. Security knowledge can be misused. The model is gated and authorization-scoped
	for this reason, but gating cannot prevent all misuse — see the Disclaimer.
	- Inherited base behaviour. Limitations and biases of `Qwen2.5-Coder-14B-Instruct` carry over.

	### Recommendations
	- Keep a human in the loop; use BountyHound as an assistive triage/ranking layer, not an oracle.
	- Validate every finding through your own impact gate before submitting; never paste output into a report unchecked.
	- Supplement with retrieval (CVE feeds, current scope) for anything time-sensitive.
	- Operate only within written authorization and your program's rules; follow responsible disclosure.

	---

	## Disclaimer

	This model is provided "as is" and "as available", without warranty of any kind, express
	or implied, including merchantability, fitness for a particular purpose, and non-infringement.
	By accessing or using BountyHound you acknowledge that you are solely responsible for your
	use of the model and its outputs, and you agree to indemnify and hold harmless the author
	and any affiliated parties from any claims, liabilities, damages, or costs arising from that
	use. Use is at your own risk and discretion. You are responsible for ensuring your use complies
	with all applicable laws, regulations, and the rules of any program or system you test. The
	author does not endorse or condone any unauthorized or unlawful use.

	## License and attribution
	- Weights are derived from `Qwen/Qwen2.5-Coder-14B-Instruct` and released under
	Apache-2.0, the base model's license.
	- Built with [Unsloth](https://github.com/unslothai/unsloth) and
	[TRL](https://github.com/huggingface/trl).

	## Versions
	- v1 (this release) — core triage + recon co-pilot (≈6.2K-conversation mix).
	- v2 (in training) — adds a large, defanged CVE/CWE/vuln-class breadth layer derived
	from public exploit metadata; published with a head-to-head v1-vs-v2-vs-base scorecard.

	## Citation
	```bibtex
	@misc{bountyhound2026,
	title = {BountyHound-Coder-14B: a gated bug-bounty triage and recon co-pilot},
	author = {athulkrishnan},
	year = {2026},
	howpublished = {\url{https://huggingface.co/athulkrishnan/BountyHound-Coder-14B}},
	note = {QLoRA SFT of Qwen2.5-Coder-14B-Instruct}
	}
	```