Initial upload: fine-tune weights, config, tokenizer, model card

544dcf2 verified about 1 month ago

5.54 kB

	---
	license: mit
	base_model: Jackrong/Qwopus3.5-27B-v3
	tags:
	- security
	- reasoning
	- qwen3_5
	- distillation
	- fine-tuned
	language:
	- en
	pipeline_tag: text-generation
	---

	# Condor-27B

	A security-reasoning fine-tune of [`Jackrong/Qwopus3.5-27B-v3`](https://huggingface.co/Jackrong/Qwopus3.5-27B-v3), distilled from Claude Opus reasoning traces on exploit development, vulnerability analysis, and defensive security topics.

	## Model Summary

	- Base model: `Jackrong/Qwopus3.5-27B-v3` (27B, Qwen3.5 hybrid linear/full attention architecture)
	- Training type: Full fine-tune (bf16, DeepSpeed ZeRO-3)
	- Focus: Security reasoning — binary exploitation, web/app vulnerabilities, kernel/OS internals, cryptography, network attacks, defensive analysis
	- Intended use: CTF assistance, security research, reading along with security books, pentesting thought-partner

	## Training

	\| \| \|
	\|---\|---\|
	\| Dataset size \| 7,735 reasoning traces \|
	\| Source prompts \| 35+ security books (seed prompts per chapter) \|
	\| Trace generator \| Claude Opus (Anthropic API) \|
	\| Steps \| 1,395 \|
	\| Wall time \| 43h 43m \|
	\| Hardware \| 8× H100 (RunPod) \|
	\| Precision \| bf16 \|
	\| Parallelism \| DeepSpeed ZeRO-3 \|
	\| Final eval loss \| 0.99 \|

	The training data was generated by prompting Claude Opus with questions derived from security literature (books, papers, writeups) and capturing its full reasoning chain. No multi-turn dialogue — single-prompt reasoning traces only.

	## Serving

	The model uses the same Qwen3.5-27B hybrid mamba architecture as the base, so any serving framework that supports that base works here. Tested with sglang on 2× A100 40GB:

	```
	python -m sglang.launch_server \
	--model-path dangell7/Condor-27B \
	--trust-remote-code \
	--tp-size 2 \
	--dtype bfloat16 \
	--context-length 8192 \
	--mem-fraction-static 0.85 \
	--kv-cache-dtype fp8_e5m2 \
	--port 30000
	```

	Requires `transformers>=5.3.0` and sglang with PR [#21404](https://github.com/sgl-project/sglang/pull/21404) (merged 2026-03-30) — earlier versions leak mamba slots under concurrent load and deadlock the scheduler.

	Observed decode throughput: ~38 tok/s on 2× A100 40GB, tp=2, single client.

	### Known caveats

	1. Chat template quirk (inherited from base): Responses may emit a stray `</think>` closing tag without a matching opening tag. This is a pre-existing quirk of `Qwopus3.5-27B-v3` and not introduced by this fine-tune. Strip it in post-processing if it breaks your parser.
	2. Longer outputs: This fine-tune learned to produce denser, longer reasoning than the base (structured sections, code snippets, citations). Set `max_tokens` ≥ 4096 for complex prompts or expect truncation.
	3. Tokenizer: Native tokenizer is included (identical vocab to the Qwen3.5-27B base model; no new tokens were added during fine-tuning). Requires `transformers>=5.3.0` to load.
	4. Concurrent serving: sglang's hybrid mamba scheduler leaks mamba slots under 2+ concurrent requests in versions before PR [#21404](https://github.com/sgl-project/sglang/pull/21404) (merged 2026-03-30). Use sglang main post that commit, or serialize requests at a gateway for older versions.

	## Evaluation

	Qualitative side-by-side vs base (`Jackrong/Qwopus3.5-27B-v3`) on 5 fixed prompts covering math, code debugging, systems reasoning, logic, and networking:

	\| Prompt \| Base \| Condor-27B \|
	\|---\|---\|---\|
	\| Multi-step math \| Correct \| Correct, headered sections + verification \|
	\| Code bug hunt \| Correct \| Correct, more senior-voice (`itertools.accumulate` alternative) \|
	\| GC vs manual vs ownership tradeoffs \| Correct, textbook-shallow \| Correct, dramatically deeper (G1/ZGC internals, code, fairness analysis) \|
	\| Three-box logic puzzle \| Correct \| Correct, tighter deduction chain \|
	\| TCP congestion control \| Correct, Reno-focused \| Correct, deeper (RFC citations, ASCII sawtooth, what-this-didn't-solve table) \|

	Summary: Correctness preserved across all 5 prompts with no regressions. Responses are markedly denser and more specific — more Opus-like in voice and structure. No repetition, mode collapse, or drift observed.

	Full eval traces: see `eval/` (if published) or reproduce with the `vibe_client.py` harness.

	## Intended Use & Limitations

	Intended use:
	- Security research, CTF assistance, reading/learning alongside security literature
	- Thought-partner for pentesting workflows with human oversight
	- Reasoning-chain generation for further distillation

	Out of scope / don't use for:
	- Autonomous offensive security operations
	- Targeting systems you don't own or have explicit authorization to test
	- Factual lookup on specific CVEs, RFCs, or fast-moving details — verify independently (the model has been observed to confidently mis-cite RFC numbers)
	- Non-English prompts (trained on English reasoning traces only)

	## Provenance

	Distilled from Claude Opus outputs via the Anthropic API. Anthropic's terms of service allow using model outputs for your own purposes including training; downstream users of this model should read Anthropic's [usage policy](https://www.anthropic.com/legal/aup) and determine their own compliance obligations.

	## License

	MIT (see LICENSE). The base model's license applies to its weights; this fine-tune's delta is released under MIT.

	## Citation

	```bibtex
	@misc{condor-27b,
	author = {Angell, Denis},
	title = {Condor-27B: A security-reasoning fine-tune of Qwopus3.5-27B-v3},
	year = {2026},
	url = {https://huggingface.co/dangell7/Condor-27B},
	}
	```