Release btl-2-coder-7B api4k-template1k LoRA adapter

3f0df2e verified 11 days ago

2.73 kB

	---
	license: apache-2.0
	base_model: unsloth/Qwen2.5-Coder-7B-Instruct
	library_name: peft
	pipeline_tag: text-generation
	tags:
	- code
	- code-review
	- security
	- qwen2.5-coder
	- lora
	- bad-theory-labs
	model_name: btl-2-coder-7B
	---

	# BTL-2 Coder 7B

	BTL-2 Coder 7B is a LoRA adapter for `unsloth/Qwen2.5-Coder-7B-Instruct`, trained for structured code-review findings.

	## Intended Use

	Use this model for local-first code review:

	- SQL injection
	- path traversal
	- authorization bypass
	- missing error handling
	- boundary/off-by-one logic
	- related security and correctness bugs

	It is not yet a general autonomous coding agent and should not be marketed as a SWE-Bench repair model.

	## Training

	- Base: `unsloth/Qwen2.5-Coder-7B-Instruct`
	- Trainer: Unsloth LoRA SFT
	- Data: `4,000` API teacher traces + `1,000` template traces
	- Split: `4,500` train / `500` eval
	- Epochs: `2`
	- Max length: `4096`

	Only redacted, opt-in traces should be used for future training.

	## Prompt

	Use strict schema prompting:

	```text
	Return only a JSON array. No markdown and no wrapper object.
	Each finding must include: severity, file, line, title, evidence, recommendation, confidence.
	severity must be exactly one of: critical, high, medium, low.
	Never put a category in severity.
	confidence must be a number from 0 to 1, never a string label.
	Every finding must include concrete evidence and a non-empty recommendation.
	```

	Example output:

	```json
	[
	{
	"severity": "critical",
	"file": "src/users.ts",
	"line": 42,
	"title": "SQL injection through string-built query",
	"evidence": "The user id is concatenated directly into the SQL string.",
	"recommendation": "Use a parameterized query.",
	"confidence": 0.96
	}
	]
	```

	## Evaluation

	\| Eval \| JSON parse \| Schema valid \| Numeric confidence \| Category hit \| File hit \| Precision \| Recall \| Weighted severity recall \|
	\| --- \| ---: \| ---: \| ---: \| ---: \| ---: \| ---: \| ---: \| ---: \|
	\| Heldout 100 strict \| 1.000 \| 0.952 \| 1.000 \| 0.783 \| 0.840 \| - \| - \| - \|
	\| Heldout 30 strict v2 \| 1.000 \| 0.975 \| 1.000 \| 0.867 \| 0.867 \| - \| - \| - \|
	\| Seeded 15 strict \| 1.000 \| 1.000 \| 1.000 \| 0.933 \| 1.000 \| 0.933 \| 0.933 \| 0.956 \|

	## Limitations

	- Strict schema prompting is required for best results.
	- The model may miss subtle multi-file issues.
	- The model can produce plausible but incorrect findings; keep human review in the loop.
	- Do not use on private repositories unless you control the inference environment and data policy.

	## Release Artifacts

	This Hugging Face repo should include:

	- `adapter_model.safetensors`
	- `adapter_config.json`
	- `tokenizer.json`
	- `tokenizer_config.json`
	- `chat_template.jinja`
	- `training_args.bin`
	- this `README.md`