File size: 5,792 Bytes

a9868ad

---
license: apache-2.0
base_model: Qwen/Qwen2.5-Coder-7B-Instruct
library_name: peft
pipeline_tag: text-generation
tags:
  - code
  - security
  - cybersecurity
  - vulnerability-detection
  - application-security
  - ai-generated-code
  - qlora
  - peft
  - qwen2.5-coder
---

# Nullsec-S1

Nullsec-S1 is an open-source security model purpose-built to audit AI-generated apps, agents, and vibecoded software before they reach production.

This repository contains the **RC2/v1.1 PEFT / QLoRA adapter** for `Qwen/Qwen2.5-Coder-7B-Instruct`. It is an adapter release, not merged full model weights. Users need the base model plus this adapter.

## Release

- Model name: Nullsec-S1
- Release: RC2/v1.1
- GitHub release tag: [`v1.0.0-rc25`](https://github.com/trynullsec/nullsec-s1/releases/tag/v1.0.0-rc25)
- Release artifact commit: `c29c7f1`
- Base model: `Qwen/Qwen2.5-Coder-7B-Instruct`
- Adapter type: PEFT / QLoRA
- Adapter weights: `adapter_model.safetensors`
- Tokenizer/chat template: included with this adapter repository

## What it is

Nullsec-S1 returns final structured JSON security audit verdicts for application code, AI-generated apps, autonomous agents, MCP tools, Web3/wallet flows, and common application-security failures.

`S1` means `Security-1`. Nullsec-S1 does **not** expose a hidden reasoning-token loop, `<thought>` format, or chain-of-thought parser. It emits a final structured security audit.

## Intended use

- Auditing AI-generated applications before deployment
- Reviewing autonomous-agent and MCP tool risk
- Reviewing Web3/wallet approval and transaction flows
- Generating structured security verdicts for CI, API, or CLI integrations
- Producing secure patch guidance for detected findings

## Out of scope

- Not a general chatbot
- Not trained from scratch
- Not a replacement for human security review
- Not a guarantee of zero vulnerabilities
- Not a universal production-safety guarantee
- No "first", "only", or "best" claim is made

## How to load with Transformers + PEFT

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel

base_model = "Qwen/Qwen2.5-Coder-7B-Instruct"
adapter_id = "trynullsec/nullsec-s1"

quant = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
)

tokenizer = AutoTokenizer.from_pretrained(adapter_id, trust_remote_code=True)
base = AutoModelForCausalLM.from_pretrained(
    base_model,
    quantization_config=quant,
    device_map="auto",
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
)
model = PeftModel.from_pretrained(base, adapter_id)
model.eval()
```

## Prompt format

Use the tokenizer chat template. The recommended user message is:

````text
Audit the following code for security vulnerabilities. Emit only the JSON verdict.

FILE: app/api/admin/route.ts
```typescript
<code here>
```
````

Use a system instruction equivalent to:

```text
You are Nullsec-1, a strict security review model. You are NOT a chatbot and you do not write features. Your only job is to audit code for security risk and emit a single JSON verdict.
```

## Output schema

Nullsec-S1 is trained to emit a single JSON object with:

- `risk_score`
- `production_ready`
- `severity`
- `confidence`
- `reasoning_summary`
- `exploit_scenario`
- `affected_files`
- `checks_performed`
- `findings`

Safe code should return an empty `findings` array:

```json
{
  "risk_score": 0,
  "production_ready": true,
  "severity": "INFO",
  "findings": []
}
```

Unsafe code should include one finding per independent issue. Downstream systems should still run deterministic schema alignment and safety enforcement over the raw model output.

## Evaluation results

On the Nullsec RC2/v1.1 111-case security benchmark:

| Metric | Result |
|---|---:|
| raw outputs | 111/111 |
| detection F1 | 0.9245 |
| precision | 0.9423 |
| recall | 0.9074 |
| false_safe_rate | 0.0 |
| safety probes | passed |

These results are benchmark-scoped and tied to the [`v1.0.0-rc25`](https://github.com/trynullsec/nullsec-s1/releases/tag/v1.0.0-rc25) release artifacts.

## Baseline comparison

On the same Nullsec RC2/v1.1 benchmark:

| System / tool | F1 |
|---|---:|
| Nullsec-S1 RC2/v1.1 | 0.9245 |
| OpenAI/Codex `gpt-5.3-codex` | 0.7252 |
| Claude Opus 4.8 | 0.6550 |
| Semgrep local rules | 0.5535 |
| Qwen2.5-Coder-7B-Instruct base | 0.0180 |

Baseline results are produced by project scripts and should be reproduced from the repository for comparison. They are not universal claims about any provider or tool.

## Limitations

- The benchmark is repo-authored and security-specific.
- Benchmark performance does not guarantee every vulnerability will be detected in arbitrary real-world code.
- Independent security review is recommended for critical systems.
- Patch correctness is structurally measured; compile/run/test verification is future work.
- Hosted-provider baselines can change over time as provider models change.
- This adapter is not merged full weights; users must load the base model.

## Safety and non-claims

Nullsec-S1's `production_ready` field is advisory until deterministic safety enforcement is applied. In the Nullsec repository, the Security Alignment Layer and Safety Layer recompute and enforce production readiness.

This release does **not** claim:

- first, only, or best model status
- guaranteed secure code
- zero vulnerabilities
- replacement for human security review
- universal production safety

## Provenance

- GitHub repo: https://github.com/trynullsec/nullsec-s1
- GitHub release: https://github.com/trynullsec/nullsec-s1/releases/tag/v1.0.0-rc25
- Base model: https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct