--- license: apache-2.0 base_model: Qwen/Qwen2.5-Coder-7B-Instruct library_name: peft pipeline_tag: text-generation tags: - code - security - cybersecurity - vulnerability-detection - application-security - ai-generated-code - qlora - peft - qwen2.5-coder --- # Nullsec-S1 Nullsec-S1 is an open-source security model purpose-built to audit AI-generated apps, agents, and vibecoded software before they reach production. This repository contains the **RC2/v1.1 PEFT / QLoRA adapter** for `Qwen/Qwen2.5-Coder-7B-Instruct`. It is an adapter release, not merged full model weights. Users need the base model plus this adapter. ## Release - Model name: Nullsec-S1 - Release: RC2/v1.1 - GitHub release tag: [`v1.0.0-rc25`](https://github.com/trynullsec/nullsec-s1/releases/tag/v1.0.0-rc25) - Release artifact commit: `c29c7f1` - Base model: `Qwen/Qwen2.5-Coder-7B-Instruct` - Adapter type: PEFT / QLoRA - Adapter weights: `adapter_model.safetensors` - Tokenizer/chat template: included with this adapter repository ## What it is Nullsec-S1 returns final structured JSON security audit verdicts for application code, AI-generated apps, autonomous agents, MCP tools, Web3/wallet flows, and common application-security failures. `S1` means `Security-1`. Nullsec-S1 does **not** expose a hidden reasoning-token loop, `` format, or chain-of-thought parser. It emits a final structured security audit. ## Intended use - Auditing AI-generated applications before deployment - Reviewing autonomous-agent and MCP tool risk - Reviewing Web3/wallet approval and transaction flows - Generating structured security verdicts for CI, API, or CLI integrations - Producing secure patch guidance for detected findings ## Out of scope - Not a general chatbot - Not trained from scratch - Not a replacement for human security review - Not a guarantee of zero vulnerabilities - Not a universal production-safety guarantee - No "first", "only", or "best" claim is made ## How to load with Transformers + PEFT ```python import torch from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig from peft import PeftModel base_model = "Qwen/Qwen2.5-Coder-7B-Instruct" adapter_id = "trynullsec/nullsec-s1" quant = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.bfloat16, bnb_4bit_use_double_quant=True, ) tokenizer = AutoTokenizer.from_pretrained(adapter_id, trust_remote_code=True) base = AutoModelForCausalLM.from_pretrained( base_model, quantization_config=quant, device_map="auto", torch_dtype=torch.bfloat16, trust_remote_code=True, ) model = PeftModel.from_pretrained(base, adapter_id) model.eval() ``` ## Prompt format Use the tokenizer chat template. The recommended user message is: ````text Audit the following code for security vulnerabilities. Emit only the JSON verdict. FILE: app/api/admin/route.ts ```typescript ``` ```` Use a system instruction equivalent to: ```text You are Nullsec-1, a strict security review model. You are NOT a chatbot and you do not write features. Your only job is to audit code for security risk and emit a single JSON verdict. ``` ## Output schema Nullsec-S1 is trained to emit a single JSON object with: - `risk_score` - `production_ready` - `severity` - `confidence` - `reasoning_summary` - `exploit_scenario` - `affected_files` - `checks_performed` - `findings` Safe code should return an empty `findings` array: ```json { "risk_score": 0, "production_ready": true, "severity": "INFO", "findings": [] } ``` Unsafe code should include one finding per independent issue. Downstream systems should still run deterministic schema alignment and safety enforcement over the raw model output. ## Evaluation results On the Nullsec RC2/v1.1 111-case security benchmark: | Metric | Result | |---|---:| | raw outputs | 111/111 | | detection F1 | 0.9245 | | precision | 0.9423 | | recall | 0.9074 | | false_safe_rate | 0.0 | | safety probes | passed | These results are benchmark-scoped and tied to the [`v1.0.0-rc25`](https://github.com/trynullsec/nullsec-s1/releases/tag/v1.0.0-rc25) release artifacts. ## Baseline comparison On the same Nullsec RC2/v1.1 benchmark: | System / tool | F1 | |---|---:| | Nullsec-S1 RC2/v1.1 | 0.9245 | | OpenAI/Codex `gpt-5.3-codex` | 0.7252 | | Claude Opus 4.8 | 0.6550 | | Semgrep local rules | 0.5535 | | Qwen2.5-Coder-7B-Instruct base | 0.0180 | Baseline results are produced by project scripts and should be reproduced from the repository for comparison. They are not universal claims about any provider or tool. ## Limitations - The benchmark is repo-authored and security-specific. - Benchmark performance does not guarantee every vulnerability will be detected in arbitrary real-world code. - Independent security review is recommended for critical systems. - Patch correctness is structurally measured; compile/run/test verification is future work. - Hosted-provider baselines can change over time as provider models change. - This adapter is not merged full weights; users must load the base model. ## Safety and non-claims Nullsec-S1's `production_ready` field is advisory until deterministic safety enforcement is applied. In the Nullsec repository, the Security Alignment Layer and Safety Layer recompute and enforce production readiness. This release does **not** claim: - first, only, or best model status - guaranteed secure code - zero vulnerabilities - replacement for human security review - universal production safety ## Provenance - GitHub repo: https://github.com/trynullsec/nullsec-s1 - GitHub release: https://github.com/trynullsec/nullsec-s1/releases/tag/v1.0.0-rc25 - Base model: https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct