Upload folder using huggingface_hub

a9868ad verified 4 days ago

5.79 kB

	---
	license: apache-2.0
	base_model: Qwen/Qwen2.5-Coder-7B-Instruct
	library_name: peft
	pipeline_tag: text-generation
	tags:
	- code
	- security
	- cybersecurity
	- vulnerability-detection
	- application-security
	- ai-generated-code
	- qlora
	- peft
	- qwen2.5-coder
	---

	# Nullsec-S1

	Nullsec-S1 is an open-source security model purpose-built to audit AI-generated apps, agents, and vibecoded software before they reach production.

	This repository contains the RC2/v1.1 PEFT / QLoRA adapter for `Qwen/Qwen2.5-Coder-7B-Instruct`. It is an adapter release, not merged full model weights. Users need the base model plus this adapter.

	## Release

	- Model name: Nullsec-S1
	- Release: RC2/v1.1
	- GitHub release tag: [`v1.0.0-rc25`](https://github.com/trynullsec/nullsec-s1/releases/tag/v1.0.0-rc25)
	- Release artifact commit: `c29c7f1`
	- Base model: `Qwen/Qwen2.5-Coder-7B-Instruct`
	- Adapter type: PEFT / QLoRA
	- Adapter weights: `adapter_model.safetensors`
	- Tokenizer/chat template: included with this adapter repository

	## What it is

	Nullsec-S1 returns final structured JSON security audit verdicts for application code, AI-generated apps, autonomous agents, MCP tools, Web3/wallet flows, and common application-security failures.

	`S1` means `Security-1`. Nullsec-S1 does not expose a hidden reasoning-token loop, `<thought>` format, or chain-of-thought parser. It emits a final structured security audit.

	## Intended use

	- Auditing AI-generated applications before deployment
	- Reviewing autonomous-agent and MCP tool risk
	- Reviewing Web3/wallet approval and transaction flows
	- Generating structured security verdicts for CI, API, or CLI integrations
	- Producing secure patch guidance for detected findings

	## Out of scope

	- Not a general chatbot
	- Not trained from scratch
	- Not a replacement for human security review
	- Not a guarantee of zero vulnerabilities
	- Not a universal production-safety guarantee
	- No "first", "only", or "best" claim is made

	## How to load with Transformers + PEFT

	```python
	import torch
	from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
	from peft import PeftModel

	base_model = "Qwen/Qwen2.5-Coder-7B-Instruct"
	adapter_id = "trynullsec/nullsec-s1"

	quant = BitsAndBytesConfig(
	load_in_4bit=True,
	bnb_4bit_quant_type="nf4",
	bnb_4bit_compute_dtype=torch.bfloat16,
	bnb_4bit_use_double_quant=True,
	)

	tokenizer = AutoTokenizer.from_pretrained(adapter_id, trust_remote_code=True)
	base = AutoModelForCausalLM.from_pretrained(
	base_model,
	quantization_config=quant,
	device_map="auto",
	torch_dtype=torch.bfloat16,
	trust_remote_code=True,
	)
	model = PeftModel.from_pretrained(base, adapter_id)
	model.eval()
	```

	## Prompt format

	Use the tokenizer chat template. The recommended user message is:

	````text
	Audit the following code for security vulnerabilities. Emit only the JSON verdict.

	FILE: app/api/admin/route.ts
	```typescript
	<code here>
	```
	````

	Use a system instruction equivalent to:

	```text
	You are Nullsec-1, a strict security review model. You are NOT a chatbot and you do not write features. Your only job is to audit code for security risk and emit a single JSON verdict.
	```

	## Output schema

	Nullsec-S1 is trained to emit a single JSON object with:

	- `risk_score`
	- `production_ready`
	- `severity`
	- `confidence`
	- `reasoning_summary`
	- `exploit_scenario`
	- `affected_files`
	- `checks_performed`
	- `findings`

	Safe code should return an empty `findings` array:

	```json
	{
	"risk_score": 0,
	"production_ready": true,
	"severity": "INFO",
	"findings": []
	}
	```

	Unsafe code should include one finding per independent issue. Downstream systems should still run deterministic schema alignment and safety enforcement over the raw model output.

	## Evaluation results

	On the Nullsec RC2/v1.1 111-case security benchmark:

	\| Metric \| Result \|
	\|---\|---:\|
	\| raw outputs \| 111/111 \|
	\| detection F1 \| 0.9245 \|
	\| precision \| 0.9423 \|
	\| recall \| 0.9074 \|
	\| false_safe_rate \| 0.0 \|
	\| safety probes \| passed \|

	These results are benchmark-scoped and tied to the [`v1.0.0-rc25`](https://github.com/trynullsec/nullsec-s1/releases/tag/v1.0.0-rc25) release artifacts.

	## Baseline comparison

	On the same Nullsec RC2/v1.1 benchmark:

	\| System / tool \| F1 \|
	\|---\|---:\|
	\| Nullsec-S1 RC2/v1.1 \| 0.9245 \|
	\| OpenAI/Codex `gpt-5.3-codex` \| 0.7252 \|
	\| Claude Opus 4.8 \| 0.6550 \|
	\| Semgrep local rules \| 0.5535 \|
	\| Qwen2.5-Coder-7B-Instruct base \| 0.0180 \|

	Baseline results are produced by project scripts and should be reproduced from the repository for comparison. They are not universal claims about any provider or tool.

	## Limitations

	- The benchmark is repo-authored and security-specific.
	- Benchmark performance does not guarantee every vulnerability will be detected in arbitrary real-world code.
	- Independent security review is recommended for critical systems.
	- Patch correctness is structurally measured; compile/run/test verification is future work.
	- Hosted-provider baselines can change over time as provider models change.
	- This adapter is not merged full weights; users must load the base model.

	## Safety and non-claims

	Nullsec-S1's `production_ready` field is advisory until deterministic safety enforcement is applied. In the Nullsec repository, the Security Alignment Layer and Safety Layer recompute and enforce production readiness.

	This release does not claim:

	- first, only, or best model status
	- guaranteed secure code
	- zero vulnerabilities
	- replacement for human security review
	- universal production safety

	## Provenance

	- GitHub repo: https://github.com/trynullsec/nullsec-s1
	- GitHub release: https://github.com/trynullsec/nullsec-s1/releases/tag/v1.0.0-rc25
	- Base model: https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct