Text Generation
Transformers
Safetensors
GGUF
PEFT
English
qwen2
security
cybersecurity
bug-bounty
vulnerability-triage
vulnerability-management
recon
offensive-security
blue-team
code
conversational
unsloth
trl
lora
qlora
text-generation-inference
Instructions to use athulkrishnan/BountyHound-Coder-14B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use athulkrishnan/BountyHound-Coder-14B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="athulkrishnan/BountyHound-Coder-14B") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("athulkrishnan/BountyHound-Coder-14B") model = AutoModelForCausalLM.from_pretrained("athulkrishnan/BountyHound-Coder-14B") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - PEFT
How to use athulkrishnan/BountyHound-Coder-14B with PEFT:
Task type is invalid.
- llama-cpp-python
How to use athulkrishnan/BountyHound-Coder-14B with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="athulkrishnan/BountyHound-Coder-14B", filename="gguf/BountyHound-Coder-14B-Q4_K_M.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use athulkrishnan/BountyHound-Coder-14B with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf athulkrishnan/BountyHound-Coder-14B:Q4_K_M # Run inference directly in the terminal: llama-cli -hf athulkrishnan/BountyHound-Coder-14B:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf athulkrishnan/BountyHound-Coder-14B:Q4_K_M # Run inference directly in the terminal: llama-cli -hf athulkrishnan/BountyHound-Coder-14B:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf athulkrishnan/BountyHound-Coder-14B:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf athulkrishnan/BountyHound-Coder-14B:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf athulkrishnan/BountyHound-Coder-14B:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf athulkrishnan/BountyHound-Coder-14B:Q4_K_M
Use Docker
docker model run hf.co/athulkrishnan/BountyHound-Coder-14B:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use athulkrishnan/BountyHound-Coder-14B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "athulkrishnan/BountyHound-Coder-14B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "athulkrishnan/BountyHound-Coder-14B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/athulkrishnan/BountyHound-Coder-14B:Q4_K_M
- SGLang
How to use athulkrishnan/BountyHound-Coder-14B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "athulkrishnan/BountyHound-Coder-14B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "athulkrishnan/BountyHound-Coder-14B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "athulkrishnan/BountyHound-Coder-14B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "athulkrishnan/BountyHound-Coder-14B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Ollama
How to use athulkrishnan/BountyHound-Coder-14B with Ollama:
ollama run hf.co/athulkrishnan/BountyHound-Coder-14B:Q4_K_M
- Unsloth Studio
How to use athulkrishnan/BountyHound-Coder-14B with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for athulkrishnan/BountyHound-Coder-14B to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for athulkrishnan/BountyHound-Coder-14B to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for athulkrishnan/BountyHound-Coder-14B to start chatting
- Pi
How to use athulkrishnan/BountyHound-Coder-14B with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf athulkrishnan/BountyHound-Coder-14B:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "athulkrishnan/BountyHound-Coder-14B:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use athulkrishnan/BountyHound-Coder-14B with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf athulkrishnan/BountyHound-Coder-14B:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default athulkrishnan/BountyHound-Coder-14B:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use athulkrishnan/BountyHound-Coder-14B with Docker Model Runner:
docker model run hf.co/athulkrishnan/BountyHound-Coder-14B:Q4_K_M
- Lemonade
How to use athulkrishnan/BountyHound-Coder-14B with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull athulkrishnan/BountyHound-Coder-14B:Q4_K_M
Run and chat with the model
lemonade run user.BountyHound-Coder-14B-Q4_K_M
List all available models
lemonade list
| license: apache-2.0 | |
| license_link: https://huggingface.co/Qwen/Qwen2.5-Coder-14B-Instruct/blob/main/LICENSE | |
| base_model: Qwen/Qwen2.5-Coder-14B-Instruct | |
| base_model_relation: finetune | |
| library_name: transformers | |
| pipeline_tag: text-generation | |
| language: | |
| - en | |
| tags: | |
| - security | |
| - cybersecurity | |
| - bug-bounty | |
| - vulnerability-triage | |
| - vulnerability-management | |
| - recon | |
| - offensive-security | |
| - blue-team | |
| - qwen2 | |
| - code | |
| - conversational | |
| - unsloth | |
| - trl | |
| - peft | |
| - lora | |
| - qlora | |
| - text-generation-inference | |
| extra_gated_heading: "Request access to BountyHound-Coder-14B" | |
| extra_gated_button_content: "Request access" | |
| extra_gated_prompt: >- | |
| BountyHound is released for AUTHORIZED security research and defensive triage only. | |
| By requesting access you confirm that you will use this model ONLY against assets you | |
| are explicitly authorized to test (in-scope bug-bounty programs, systems you own, or | |
| written penetration-test/red-team engagements), that you will follow coordinated / | |
| responsible disclosure, and that you accept the "as-is, no warranty" terms in the | |
| Disclaimer section of this card. Access is granted at the maintainer's discretion. | |
| extra_gated_fields: | |
| Full name: text | |
| Affiliation or handle: text | |
| Primary use case: text | |
| Country: country | |
| I will only use this model on systems I am explicitly authorized to test: checkbox | |
| I will follow responsible / coordinated disclosure: checkbox | |
| I accept the as-is, no-warranty terms in the model card: checkbox | |
| # 🐕🦺 BountyHound-Coder-14B | |
| > A gated, security-specialised **co-pilot** for bug-bounty triage and recon prioritisation, | |
| > fine-tuned from **Qwen2.5-Coder-14B-Instruct**. Built to run locally on a single 16 GB GPU. | |
| BountyHound is **not** an autonomous hacking agent. It is a decision-support model that | |
| helps an authorized researcher answer two questions fast and well: | |
| 1. **"Is this finding real and worth submitting?"** — submit/kill triage, impact reasoning, | |
| false-positive and out-of-scope filtering. | |
| 2. **"Where should I look first?"** — recon attack-surface ranking: a tech stack in, | |
| a prioritised, rationale-backed vulnerability-class hit-list out. | |
| It is deliberately **terse and impact-first**: it kills weak findings, asks for proof of | |
| exploitation, and avoids "could potentially" filler. | |
| > **Gated and authorized-use-only.** Access requires approval. See **Intended use**, | |
| > **Bias, Risks & Limitations**, and the **Disclaimer** before requesting. | |
| --- | |
| ## Model information | |
| | | | | |
| |---|---| | |
| | **Developer** | `athulkrishnan` (independent) | | |
| | **Model type** | Auto-regressive transformer (decoder-only), instruction-tuned | | |
| | **Base model** | [`Qwen/Qwen2.5-Coder-14B-Instruct`](https://huggingface.co/Qwen/Qwen2.5-Coder-14B-Instruct) (~14.7B params, 48 layers) | | |
| | **Fine-tune method** | QLoRA SFT (4-bit NF4 base, LoRA r=32) via [Unsloth](https://github.com/unslothai/unsloth) + [TRL](https://github.com/huggingface/trl) | | |
| | **Specialisation** | Bug-bounty finding triage/validation · recon attack-surface ranking | | |
| | **Language** | English | | |
| | **Context length** | 32,768 native (up to 131K with YaRN); trained at 2,048 | | |
| | **Precision / formats** | Merged **BF16** safetensors · **Q4_K_M GGUF** in [`gguf/`](./gguf) | | |
| | **License** | Apache-2.0 (inherited from the Qwen base) | | |
| | **Status** | Static, offline fine-tune · **v1** (see [Versions](#versions)) | | |
| --- | |
| ## Intended use | |
| ### Intended use cases | |
| - **Finding triage & validation** — decide *submit vs. kill*, sanity-check severity, reason | |
| about real-world impact, and cut duplicate / informational / out-of-scope noise before a | |
| human writes a report. | |
| - **Recon prioritisation** — turn a fingerprinted tech stack or attack surface into a ranked | |
| hit-list of vulnerability classes worth testing first, with one-line rationale. | |
| - **Methodology assistant** — explain bug classes, CWE mappings, and report framing to support | |
| authorized learning and assessment work. | |
| ### Downstream use | |
| - A local triage/ranking step inside an **authorized** bug-bounty or pentest workflow | |
| (human-in-the-loop), e.g. pre-filtering scanner output or drafting impact statements. | |
| - A base for further domain fine-tuning or for pairing with retrieval (RAG) over fresh | |
| CVEs / current program scope. | |
| ### Out-of-scope and prohibited use | |
| - Testing, scanning, or exploiting systems you are **not explicitly authorized** to assess. | |
| - Autonomous attack execution without human review — BountyHound is a co-pilot, not an agent. | |
| - Generating malware, phishing, or weaponised exploit payloads for unauthorized use. | |
| - Treating outputs as ground truth, or as legal/compliance advice. **Always validate.** | |
| - Any use that violates applicable law or platform/program rules. | |
| --- | |
| ## How to get started | |
| ### Requirements | |
| `transformers >= 4.40` (developed on 4.56.2), `torch >= 2.3`, and `accelerate`. | |
| The merged model is BF16 (~29 GB); for a single 16 GB GPU use the **Q4_K_M GGUF** with | |
| llama.cpp / Ollama, or load in 4-bit with `bitsandbytes`. | |
| ### Transformers | |
| ```python | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| repo = "athulkrishnan/BountyHound-Coder-14B" | |
| tok = AutoTokenizer.from_pretrained(repo) | |
| model = AutoModelForCausalLM.from_pretrained(repo, torch_dtype="auto", device_map="auto") | |
| SYSTEM = ( | |
| "You are a bug-bounty co-pilot for an authorized security researcher. You assist ONLY " | |
| "with testing that is in-scope and authorized on bug-bounty programs. You are sharp, " | |
| "terse, and impact-first: you kill weak findings, prove real exploitation, and never pad " | |
| "reports with 'could potentially'. Your specialties are finding triage/validation and " | |
| "recon attack-surface ranking." | |
| ) | |
| messages = [ | |
| {"role": "system", "content": SYSTEM}, | |
| {"role": "user", "content": | |
| "Triage: reflected XSS on a marketing page, unauthenticated, no session context. " | |
| "Submit or kill? One line + why."}, | |
| ] | |
| ids = tok.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device) | |
| out = model.generate(ids, max_new_tokens=256, temperature=0.3, top_p=0.9) | |
| print(tok.decode(out[0][ids.shape[1]:], skip_special_tokens=True)) | |
| ``` | |
| ### Ollama / llama.cpp (GGUF) | |
| Download `gguf/BountyHound-Coder-14B-Q4_K_M.gguf`, then create a `Modelfile`: | |
| ```dockerfile | |
| FROM ./BountyHound-Coder-14B-Q4_K_M.gguf | |
| TEMPLATE """{{- if .System }}<|im_start|>system | |
| {{ .System }}<|im_end|> | |
| {{ end }}{{- range .Messages }}<|im_start|>{{ .Role }} | |
| {{ .Content }}<|im_end|> | |
| {{ end }}<|im_start|>assistant | |
| """ | |
| SYSTEM """You are a bug-bounty co-pilot for an authorized security researcher. You assist ONLY with testing that is in-scope and authorized on bug-bounty programs. You are sharp, terse, and impact-first: you kill weak findings, prove real exploitation, and never pad reports with 'could potentially'. Your specialties are finding triage/validation and recon attack-surface ranking.""" | |
| PARAMETER temperature 0.3 | |
| PARAMETER top_p 0.9 | |
| PARAMETER stop "<|im_start|>" | |
| PARAMETER stop "<|im_end|>" | |
| ``` | |
| ```bash | |
| ollama create bountyhound -f Modelfile | |
| ollama run bountyhound "Rank the attack surface for a Spring Boot + GraphQL + S3 stack." | |
| ``` | |
| ### Prompt format | |
| Qwen2.5 **ChatML** (`<|im_start|>role … <|im_end|>`) with the security system prompt above. | |
| Recommended decoding: `temperature 0.3`, `top_p 0.9`, `repeat_penalty 1.05`. | |
| --- | |
| ## Training | |
| ### Training data | |
| A weighted instruction mix biased toward the two target skills (≈6.2K curated conversations): | |
| | Source | Purpose | | |
| |---|---| | |
| | HackerOne **disclosed** reports (public) | finding disposition + severity-triage signal | | |
| | Curated bug-bounty methodology & triage heuristics | submit/kill discipline, validation gates, anti-patterns | | |
| | Recon playbook / attack-surface examples | tech-stack to ranked vulnerability classes | | |
| | Public detection-template patterns | low-false-positive authoring style | | |
| | General-security instruction data (~13%) | rehearsal to limit catastrophic forgetting | | |
| > **No customer data, private program scope, credentials, or other non-public material is | |
| > included in the training set.** Only public or self-authored content was used. | |
| ### Training procedure | |
| QLoRA supervised fine-tuning, loss computed on **assistant turns only**. | |
| | Hyperparameter | Value | | |
| |---|---| | |
| | Quantisation | 4-bit NF4 (base), BF16 compute | | |
| | LoRA | r=32, α=32, dropout=0, all linear projections | | |
| | Optimiser | paged AdamW 8-bit, weight decay 0.01 | | |
| | LR / schedule | 2e-4, cosine, 3% warmup | | |
| | Epochs / eff. batch | 2 / 8 (micro-batch 1 × grad-accum 8) | | |
| | Max sequence length | 2,048 | | |
| | Hardware | 1× NVIDIA RTX 4070 Ti SUPER (16 GB) | | |
| | Frameworks | Unsloth · TRL 0.22 · Transformers 4.56 · PyTorch 2.9 | | |
| ### Evaluation | |
| v1 is scored with a deterministic, rubric-based held-out harness (no LLM judge): each item is | |
| decision- or rubric-scorable across **triage** (submit/kill accuracy), **recon ranking** | |
| (expected-class recall), and **rubric** categories (report/nuclei/payload/coding), comparing | |
| the tune against the `Qwen2.5-Coder-14B` base. The **ship gate** requires improvement on the | |
| two priority skills (triage, ranking) with no material regression on general coding | |
| (guarding against catastrophic forgetting). A full quantitative scorecard is published | |
| alongside **v2**; treat v1 as a capable assistant, not a benchmarked SOTA system. | |
| --- | |
| ## Bias, risks, and limitations | |
| - **Not a vulnerability discoverer.** A 14B local model assists *triage and prioritisation*; | |
| it does not autonomously find or weaponise novel bugs, and can miss context a human or a | |
| larger system would catch. | |
| - **Can be confidently wrong.** It may over- or under-rate severity, hallucinate a CWE/CVE, | |
| or mis-scope a finding. Every output must be validated before acting or reporting. | |
| - **Frozen knowledge.** Trained on a static snapshot — it will not know the newest CVEs, | |
| techniques, or your current program scope. Pair with retrieval for facts. | |
| - **Domain bias.** Trained heavily on web-app / HackerOne-style findings; it is weaker on | |
| niche stacks, hardware, embedded, and non-web targets. | |
| - **Dual-use.** Security knowledge can be misused. The model is gated and authorization-scoped | |
| for this reason, but gating cannot prevent all misuse — see the Disclaimer. | |
| - **Inherited base behaviour.** Limitations and biases of `Qwen2.5-Coder-14B-Instruct` carry over. | |
| ### Recommendations | |
| - Keep a **human in the loop**; use BountyHound as an assistive triage/ranking layer, not an oracle. | |
| - **Validate** every finding through your own impact gate before submitting; never paste output into a report unchecked. | |
| - **Supplement with retrieval** (CVE feeds, current scope) for anything time-sensitive. | |
| - Operate only within written authorization and your program's rules; follow **responsible disclosure**. | |
| --- | |
| ## Disclaimer | |
| This model is provided **"as is" and "as available", without warranty of any kind**, express | |
| or implied, including merchantability, fitness for a particular purpose, and non-infringement. | |
| By accessing or using BountyHound you acknowledge that **you are solely responsible** for your | |
| use of the model and its outputs, and you agree to **indemnify and hold harmless** the author | |
| and any affiliated parties from any claims, liabilities, damages, or costs arising from that | |
| use. Use is at your own risk and discretion. You are responsible for ensuring your use complies | |
| with all applicable laws, regulations, and the rules of any program or system you test. The | |
| author does not endorse or condone any unauthorized or unlawful use. | |
| ## License and attribution | |
| - Weights are derived from **`Qwen/Qwen2.5-Coder-14B-Instruct`** and released under | |
| **Apache-2.0**, the base model's license. | |
| - Built with **[Unsloth](https://github.com/unslothai/unsloth)** and | |
| **[TRL](https://github.com/huggingface/trl)**. | |
| ## Versions | |
| - **v1** *(this release)* — core triage + recon co-pilot (≈6.2K-conversation mix). | |
| - **v2** *(in training)* — adds a large, defanged **CVE/CWE/vuln-class** breadth layer derived | |
| from public exploit metadata; published with a head-to-head v1-vs-v2-vs-base scorecard. | |
| ## Citation | |
| ```bibtex | |
| @misc{bountyhound2026, | |
| title = {BountyHound-Coder-14B: a gated bug-bounty triage and recon co-pilot}, | |
| author = {athulkrishnan}, | |
| year = {2026}, | |
| howpublished = {\url{https://huggingface.co/athulkrishnan/BountyHound-Coder-14B}}, | |
| note = {QLoRA SFT of Qwen2.5-Coder-14B-Instruct} | |
| } | |
| ``` | |