Trynullsec commited on
Commit
a9868ad
·
verified ·
1 Parent(s): 954612c

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,187 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: Qwen/Qwen2.5-Coder-7B-Instruct
4
+ library_name: peft
5
+ pipeline_tag: text-generation
6
+ tags:
7
+ - code
8
+ - security
9
+ - cybersecurity
10
+ - vulnerability-detection
11
+ - application-security
12
+ - ai-generated-code
13
+ - qlora
14
+ - peft
15
+ - qwen2.5-coder
16
+ ---
17
+
18
+ # Nullsec-S1
19
+
20
+ Nullsec-S1 is an open-source security model purpose-built to audit AI-generated apps, agents, and vibecoded software before they reach production.
21
+
22
+ This repository contains the **RC2/v1.1 PEFT / QLoRA adapter** for `Qwen/Qwen2.5-Coder-7B-Instruct`. It is an adapter release, not merged full model weights. Users need the base model plus this adapter.
23
+
24
+ ## Release
25
+
26
+ - Model name: Nullsec-S1
27
+ - Release: RC2/v1.1
28
+ - GitHub release tag: [`v1.0.0-rc25`](https://github.com/trynullsec/nullsec-s1/releases/tag/v1.0.0-rc25)
29
+ - Release artifact commit: `c29c7f1`
30
+ - Base model: `Qwen/Qwen2.5-Coder-7B-Instruct`
31
+ - Adapter type: PEFT / QLoRA
32
+ - Adapter weights: `adapter_model.safetensors`
33
+ - Tokenizer/chat template: included with this adapter repository
34
+
35
+ ## What it is
36
+
37
+ Nullsec-S1 returns final structured JSON security audit verdicts for application code, AI-generated apps, autonomous agents, MCP tools, Web3/wallet flows, and common application-security failures.
38
+
39
+ `S1` means `Security-1`. Nullsec-S1 does **not** expose a hidden reasoning-token loop, `<thought>` format, or chain-of-thought parser. It emits a final structured security audit.
40
+
41
+ ## Intended use
42
+
43
+ - Auditing AI-generated applications before deployment
44
+ - Reviewing autonomous-agent and MCP tool risk
45
+ - Reviewing Web3/wallet approval and transaction flows
46
+ - Generating structured security verdicts for CI, API, or CLI integrations
47
+ - Producing secure patch guidance for detected findings
48
+
49
+ ## Out of scope
50
+
51
+ - Not a general chatbot
52
+ - Not trained from scratch
53
+ - Not a replacement for human security review
54
+ - Not a guarantee of zero vulnerabilities
55
+ - Not a universal production-safety guarantee
56
+ - No "first", "only", or "best" claim is made
57
+
58
+ ## How to load with Transformers + PEFT
59
+
60
+ ```python
61
+ import torch
62
+ from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
63
+ from peft import PeftModel
64
+
65
+ base_model = "Qwen/Qwen2.5-Coder-7B-Instruct"
66
+ adapter_id = "trynullsec/nullsec-s1"
67
+
68
+ quant = BitsAndBytesConfig(
69
+ load_in_4bit=True,
70
+ bnb_4bit_quant_type="nf4",
71
+ bnb_4bit_compute_dtype=torch.bfloat16,
72
+ bnb_4bit_use_double_quant=True,
73
+ )
74
+
75
+ tokenizer = AutoTokenizer.from_pretrained(adapter_id, trust_remote_code=True)
76
+ base = AutoModelForCausalLM.from_pretrained(
77
+ base_model,
78
+ quantization_config=quant,
79
+ device_map="auto",
80
+ torch_dtype=torch.bfloat16,
81
+ trust_remote_code=True,
82
+ )
83
+ model = PeftModel.from_pretrained(base, adapter_id)
84
+ model.eval()
85
+ ```
86
+
87
+ ## Prompt format
88
+
89
+ Use the tokenizer chat template. The recommended user message is:
90
+
91
+ ````text
92
+ Audit the following code for security vulnerabilities. Emit only the JSON verdict.
93
+
94
+ FILE: app/api/admin/route.ts
95
+ ```typescript
96
+ <code here>
97
+ ```
98
+ ````
99
+
100
+ Use a system instruction equivalent to:
101
+
102
+ ```text
103
+ You are Nullsec-1, a strict security review model. You are NOT a chatbot and you do not write features. Your only job is to audit code for security risk and emit a single JSON verdict.
104
+ ```
105
+
106
+ ## Output schema
107
+
108
+ Nullsec-S1 is trained to emit a single JSON object with:
109
+
110
+ - `risk_score`
111
+ - `production_ready`
112
+ - `severity`
113
+ - `confidence`
114
+ - `reasoning_summary`
115
+ - `exploit_scenario`
116
+ - `affected_files`
117
+ - `checks_performed`
118
+ - `findings`
119
+
120
+ Safe code should return an empty `findings` array:
121
+
122
+ ```json
123
+ {
124
+ "risk_score": 0,
125
+ "production_ready": true,
126
+ "severity": "INFO",
127
+ "findings": []
128
+ }
129
+ ```
130
+
131
+ Unsafe code should include one finding per independent issue. Downstream systems should still run deterministic schema alignment and safety enforcement over the raw model output.
132
+
133
+ ## Evaluation results
134
+
135
+ On the Nullsec RC2/v1.1 111-case security benchmark:
136
+
137
+ | Metric | Result |
138
+ |---|---:|
139
+ | raw outputs | 111/111 |
140
+ | detection F1 | 0.9245 |
141
+ | precision | 0.9423 |
142
+ | recall | 0.9074 |
143
+ | false_safe_rate | 0.0 |
144
+ | safety probes | passed |
145
+
146
+ These results are benchmark-scoped and tied to the [`v1.0.0-rc25`](https://github.com/trynullsec/nullsec-s1/releases/tag/v1.0.0-rc25) release artifacts.
147
+
148
+ ## Baseline comparison
149
+
150
+ On the same Nullsec RC2/v1.1 benchmark:
151
+
152
+ | System / tool | F1 |
153
+ |---|---:|
154
+ | Nullsec-S1 RC2/v1.1 | 0.9245 |
155
+ | OpenAI/Codex `gpt-5.3-codex` | 0.7252 |
156
+ | Claude Opus 4.8 | 0.6550 |
157
+ | Semgrep local rules | 0.5535 |
158
+ | Qwen2.5-Coder-7B-Instruct base | 0.0180 |
159
+
160
+ Baseline results are produced by project scripts and should be reproduced from the repository for comparison. They are not universal claims about any provider or tool.
161
+
162
+ ## Limitations
163
+
164
+ - The benchmark is repo-authored and security-specific.
165
+ - Benchmark performance does not guarantee every vulnerability will be detected in arbitrary real-world code.
166
+ - Independent security review is recommended for critical systems.
167
+ - Patch correctness is structurally measured; compile/run/test verification is future work.
168
+ - Hosted-provider baselines can change over time as provider models change.
169
+ - This adapter is not merged full weights; users must load the base model.
170
+
171
+ ## Safety and non-claims
172
+
173
+ Nullsec-S1's `production_ready` field is advisory until deterministic safety enforcement is applied. In the Nullsec repository, the Security Alignment Layer and Safety Layer recompute and enforce production readiness.
174
+
175
+ This release does **not** claim:
176
+
177
+ - first, only, or best model status
178
+ - guaranteed secure code
179
+ - zero vulnerabilities
180
+ - replacement for human security review
181
+ - universal production safety
182
+
183
+ ## Provenance
184
+
185
+ - GitHub repo: https://github.com/trynullsec/nullsec-s1
186
+ - GitHub release: https://github.com/trynullsec/nullsec-s1/releases/tag/v1.0.0-rc25
187
+ - Base model: https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct
adapter_config.json ADDED
@@ -0,0 +1,48 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alora_invocation_tokens": null,
3
+ "alpha_pattern": {},
4
+ "arrow_config": null,
5
+ "auto_mapping": null,
6
+ "base_model_name_or_path": "Qwen/Qwen2.5-Coder-7B-Instruct",
7
+ "bias": "none",
8
+ "corda_config": null,
9
+ "ensure_weight_tying": false,
10
+ "eva_config": null,
11
+ "exclude_modules": null,
12
+ "fan_in_fan_out": false,
13
+ "inference_mode": true,
14
+ "init_lora_weights": true,
15
+ "layer_replication": null,
16
+ "layers_pattern": null,
17
+ "layers_to_transform": null,
18
+ "loftq_config": {},
19
+ "lora_alpha": 32,
20
+ "lora_bias": false,
21
+ "lora_dropout": 0.05,
22
+ "lora_ga_config": null,
23
+ "megatron_config": null,
24
+ "megatron_core": "megatron.core",
25
+ "modules_to_save": null,
26
+ "peft_type": "LORA",
27
+ "peft_version": "0.19.1",
28
+ "qalora_group_size": 16,
29
+ "r": 16,
30
+ "rank_pattern": {},
31
+ "revision": null,
32
+ "target_modules": [
33
+ "q_proj",
34
+ "v_proj",
35
+ "down_proj",
36
+ "gate_proj",
37
+ "up_proj",
38
+ "k_proj",
39
+ "o_proj"
40
+ ],
41
+ "target_parameters": null,
42
+ "task_type": "CAUSAL_LM",
43
+ "trainable_token_indices": null,
44
+ "use_bdlora": null,
45
+ "use_dora": false,
46
+ "use_qalora": false,
47
+ "use_rslora": false
48
+ }
adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5d6da478fcca49932565ed3a2a47b84a4d01efee257f871099e19842a8c0e9f5
3
+ size 80792880
chat_template.jinja ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {%- if tools %}
2
+ {{- '<|im_start|>system\n' }}
3
+ {%- if messages[0]['role'] == 'system' %}
4
+ {{- messages[0]['content'] }}
5
+ {%- else %}
6
+ {{- 'You are Qwen, created by Alibaba Cloud. You are a helpful assistant.' }}
7
+ {%- endif %}
8
+ {{- "\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
9
+ {%- for tool in tools %}
10
+ {{- "\n" }}
11
+ {{- tool | tojson }}
12
+ {%- endfor %}
13
+ {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
14
+ {%- else %}
15
+ {%- if messages[0]['role'] == 'system' %}
16
+ {{- '<|im_start|>system\n' + messages[0]['content'] + '<|im_end|>\n' }}
17
+ {%- else %}
18
+ {{- '<|im_start|>system\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\n' }}
19
+ {%- endif %}
20
+ {%- endif %}
21
+ {%- for message in messages %}
22
+ {%- if (message.role == "user") or (message.role == "system" and not loop.first) or (message.role == "assistant" and not message.tool_calls) %}
23
+ {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}
24
+ {%- elif message.role == "assistant" %}
25
+ {{- '<|im_start|>' + message.role }}
26
+ {%- if message.content %}
27
+ {{- '\n' + message.content }}
28
+ {%- endif %}
29
+ {%- for tool_call in message.tool_calls %}
30
+ {%- if tool_call.function is defined %}
31
+ {%- set tool_call = tool_call.function %}
32
+ {%- endif %}
33
+ {{- '\n<tool_call>\n{"name": "' }}
34
+ {{- tool_call.name }}
35
+ {{- '", "arguments": ' }}
36
+ {{- tool_call.arguments | tojson }}
37
+ {{- '}\n</tool_call>' }}
38
+ {%- endfor %}
39
+ {{- '<|im_end|>\n' }}
40
+ {%- elif message.role == "tool" %}
41
+ {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != "tool") %}
42
+ {{- '<|im_start|>user' }}
43
+ {%- endif %}
44
+ {{- '\n<tool_response>\n' }}
45
+ {{- message.content }}
46
+ {{- '\n</tool_response>' }}
47
+ {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
48
+ {{- '<|im_end|>\n' }}
49
+ {%- endif %}
50
+ {%- endif %}
51
+ {%- endfor %}
52
+ {%- if add_generation_prompt %}
53
+ {{- '<|im_start|>assistant\n' }}
54
+ {%- endif %}
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3fd169731d2cbde95e10bf356d66d5997fd885dd8dbb6fb4684da3f23b2585d8
3
+ size 11421892
tokenizer_config.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "backend": "tokenizers",
4
+ "bos_token": null,
5
+ "clean_up_tokenization_spaces": false,
6
+ "eos_token": "<|im_end|>",
7
+ "errors": "replace",
8
+ "extra_special_tokens": [
9
+ "<|im_start|>",
10
+ "<|im_end|>",
11
+ "<|object_ref_start|>",
12
+ "<|object_ref_end|>",
13
+ "<|box_start|>",
14
+ "<|box_end|>",
15
+ "<|quad_start|>",
16
+ "<|quad_end|>",
17
+ "<|vision_start|>",
18
+ "<|vision_end|>",
19
+ "<|vision_pad|>",
20
+ "<|image_pad|>",
21
+ "<|video_pad|>"
22
+ ],
23
+ "is_local": false,
24
+ "local_files_only": false,
25
+ "model_max_length": 32768,
26
+ "pad_token": "<|endoftext|>",
27
+ "split_special_tokens": false,
28
+ "tokenizer_class": "Qwen2Tokenizer",
29
+ "unk_token": null
30
+ }