nullsec-s1 / README.md
Trynullsec's picture
Upload folder using huggingface_hub
a9868ad verified
---
license: apache-2.0
base_model: Qwen/Qwen2.5-Coder-7B-Instruct
library_name: peft
pipeline_tag: text-generation
tags:
- code
- security
- cybersecurity
- vulnerability-detection
- application-security
- ai-generated-code
- qlora
- peft
- qwen2.5-coder
---
# Nullsec-S1
Nullsec-S1 is an open-source security model purpose-built to audit AI-generated apps, agents, and vibecoded software before they reach production.
This repository contains the **RC2/v1.1 PEFT / QLoRA adapter** for `Qwen/Qwen2.5-Coder-7B-Instruct`. It is an adapter release, not merged full model weights. Users need the base model plus this adapter.
## Release
- Model name: Nullsec-S1
- Release: RC2/v1.1
- GitHub release tag: [`v1.0.0-rc25`](https://github.com/trynullsec/nullsec-s1/releases/tag/v1.0.0-rc25)
- Release artifact commit: `c29c7f1`
- Base model: `Qwen/Qwen2.5-Coder-7B-Instruct`
- Adapter type: PEFT / QLoRA
- Adapter weights: `adapter_model.safetensors`
- Tokenizer/chat template: included with this adapter repository
## What it is
Nullsec-S1 returns final structured JSON security audit verdicts for application code, AI-generated apps, autonomous agents, MCP tools, Web3/wallet flows, and common application-security failures.
`S1` means `Security-1`. Nullsec-S1 does **not** expose a hidden reasoning-token loop, `<thought>` format, or chain-of-thought parser. It emits a final structured security audit.
## Intended use
- Auditing AI-generated applications before deployment
- Reviewing autonomous-agent and MCP tool risk
- Reviewing Web3/wallet approval and transaction flows
- Generating structured security verdicts for CI, API, or CLI integrations
- Producing secure patch guidance for detected findings
## Out of scope
- Not a general chatbot
- Not trained from scratch
- Not a replacement for human security review
- Not a guarantee of zero vulnerabilities
- Not a universal production-safety guarantee
- No "first", "only", or "best" claim is made
## How to load with Transformers + PEFT
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel
base_model = "Qwen/Qwen2.5-Coder-7B-Instruct"
adapter_id = "trynullsec/nullsec-s1"
quant = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True,
)
tokenizer = AutoTokenizer.from_pretrained(adapter_id, trust_remote_code=True)
base = AutoModelForCausalLM.from_pretrained(
base_model,
quantization_config=quant,
device_map="auto",
torch_dtype=torch.bfloat16,
trust_remote_code=True,
)
model = PeftModel.from_pretrained(base, adapter_id)
model.eval()
```
## Prompt format
Use the tokenizer chat template. The recommended user message is:
````text
Audit the following code for security vulnerabilities. Emit only the JSON verdict.
FILE: app/api/admin/route.ts
```typescript
<code here>
```
````
Use a system instruction equivalent to:
```text
You are Nullsec-1, a strict security review model. You are NOT a chatbot and you do not write features. Your only job is to audit code for security risk and emit a single JSON verdict.
```
## Output schema
Nullsec-S1 is trained to emit a single JSON object with:
- `risk_score`
- `production_ready`
- `severity`
- `confidence`
- `reasoning_summary`
- `exploit_scenario`
- `affected_files`
- `checks_performed`
- `findings`
Safe code should return an empty `findings` array:
```json
{
"risk_score": 0,
"production_ready": true,
"severity": "INFO",
"findings": []
}
```
Unsafe code should include one finding per independent issue. Downstream systems should still run deterministic schema alignment and safety enforcement over the raw model output.
## Evaluation results
On the Nullsec RC2/v1.1 111-case security benchmark:
| Metric | Result |
|---|---:|
| raw outputs | 111/111 |
| detection F1 | 0.9245 |
| precision | 0.9423 |
| recall | 0.9074 |
| false_safe_rate | 0.0 |
| safety probes | passed |
These results are benchmark-scoped and tied to the [`v1.0.0-rc25`](https://github.com/trynullsec/nullsec-s1/releases/tag/v1.0.0-rc25) release artifacts.
## Baseline comparison
On the same Nullsec RC2/v1.1 benchmark:
| System / tool | F1 |
|---|---:|
| Nullsec-S1 RC2/v1.1 | 0.9245 |
| OpenAI/Codex `gpt-5.3-codex` | 0.7252 |
| Claude Opus 4.8 | 0.6550 |
| Semgrep local rules | 0.5535 |
| Qwen2.5-Coder-7B-Instruct base | 0.0180 |
Baseline results are produced by project scripts and should be reproduced from the repository for comparison. They are not universal claims about any provider or tool.
## Limitations
- The benchmark is repo-authored and security-specific.
- Benchmark performance does not guarantee every vulnerability will be detected in arbitrary real-world code.
- Independent security review is recommended for critical systems.
- Patch correctness is structurally measured; compile/run/test verification is future work.
- Hosted-provider baselines can change over time as provider models change.
- This adapter is not merged full weights; users must load the base model.
## Safety and non-claims
Nullsec-S1's `production_ready` field is advisory until deterministic safety enforcement is applied. In the Nullsec repository, the Security Alignment Layer and Safety Layer recompute and enforce production readiness.
This release does **not** claim:
- first, only, or best model status
- guaranteed secure code
- zero vulnerabilities
- replacement for human security review
- universal production safety
## Provenance
- GitHub repo: https://github.com/trynullsec/nullsec-s1
- GitHub release: https://github.com/trynullsec/nullsec-s1/releases/tag/v1.0.0-rc25
- Base model: https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct