File size: 6,820 Bytes
957d343
645a288
 
957d343
 
 
cfcc8c3
5543872
 
 
 
 
0043f1b
cfcc8c3
 
7a17ba6
ff7758c
 
d417c4e
ae7f59d
cfcc8c3
 
0bcae2d
ae7f59d
d417c4e
ae7f59d
d417c4e
 
ae7f59d
 
 
d417c4e
 
 
 
 
ae7f59d
 
 
 
 
 
 
 
d417c4e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ae7f59d
 
d417c4e
 
 
 
 
 
 
 
 
 
 
 
 
6828792
cfcc8c3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ae7f59d
 
d417c4e
ae7f59d
 
 
d417c4e
 
 
ae7f59d
 
 
d417c4e
ae7f59d
 
5543872
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
---
base_model:
- huihui-ai/Qwen3-8B-abliterated
language:
- en
- zh
license: apache-2.0
tags:
- unsloth
- Transformers
- Safetensors
- StrikeGPT
- cybersecurity
- llama-cpp
- gguf-my-repo
---
14/05/2025   Updated English dataset

# πŸ€– StrikeGPT-R1-Zero: Cybersecurity Penetration Testing Reasoning Model  


![image/png](https://cdn-uploads.huggingface.co/production/uploads/67c1bfdf3e9af7d134c4189d/T2JpQznw0yoUDZrf2GqX0.png)

## πŸš€ Model Introduction  
**StrikeGPT-R1-Zero** is an expert model distilled through black-box methods based on **Qwen3**, with DeepSeek-R1 as its teacher model. Coverage includes:  
πŸ”’ AI Security | πŸ›‘οΈ API Security | πŸ“± APP Security | πŸ•΅οΈ APT | 🚩 CTF  
🏭 ICS Security | πŸ’» Full Penetration Testing | ☁️ Cloud Security | πŸ“œ Code Auditing  
🦠 Antivirus Evasion | 🌐 Internal Network Security | πŸ’Ύ Digital Forensics | β‚Ώ Blockchain Security | πŸ•³οΈ Traceback & Countermeasures | 🌍 IoT Security  
🚨 Emergency Response | πŸš— Vehicle Security | πŸ‘₯ Social Engineering | πŸ’Ό Penetration Testing Interviews  

### πŸ‘‰ [Click to Access Interactive Detailed Data Distribution](https://bouquets-ai.github.io/StrikeGPT-R1-Zero/WEB)  
### 🌟 Key Features  
- 🧩 Optimized with **Chain-of-Thought (CoT) reasoning data** to enhance logical capabilities, significantly improving performance in complex tasks like vulnerability analysis  
- πŸ’ͺ Base model uses Qwen3, making it more suitable for Chinese users compared to Distill-Llama  
- ⚠️ **No ethical restrictions**β€”demonstrates unique performance in specific academic research areas (use in compliance with local laws)  
- ✨ Outperforms local RAG solutions in scenarios like offline cybersecurity competitions, with superior logical reasoning and complex task handling  

## πŸ“Š Data Distribution  
![data](https://github.com/user-attachments/assets/4d19d48d-67bb-4b05-8ce9-2000b6afa12e)  

## πŸ› οΈ Model Deployment  
### Deploy via Ollama  
`ollama run hf.co/Bouquets/StrikeGPT-R1-Zero-8B-Q4_K_M-GGUF:Q4_K_M`  

**Or directly call the original model**  
```python
from unsloth import FastLanguageModel
import torch
max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "Bouquets/StrikeGPT-R1-Zero-8B",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    # token = "hf_...",
)
alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
    alpaca_prompt.format(
        "", # instruction
        "Hello, are you developed by OpenAI?", # input
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer, skip_prompt = True)
_ = model.generate(input_ids = inputs.input_ids, attention_mask = inputs.attention_mask,
                   streamer = text_streamer, max_new_tokens = 4096, pad_token_id = tokenizer.eos_token_id)
```
![image](https://github.com/user-attachments/assets/d8cef659-3c83-4bc9-af1a-78ed6345faf2)  

*Self-awareness issues may occur after quantizationβ€”please disregard.*  
![image](https://github.com/user-attachments/assets/3989ea09-d581-49fb-9938-01b93e0beb91)  

## πŸ’» Open Source πŸ’»  
🌟 **Open-Source Model** 🌟  
πŸ€— **HuggingFace**:  
πŸ”— [https://huggingface.co/Bouquets/StrikeGPT-R1-Zero-8B](https://huggingface.co/Bouquets/StrikeGPT-R1-Zero-8B)  

πŸ“Š **Datasets** (Partial Non-Reasoning Data) πŸ“Š  
πŸ€— **HuggingFace**:  
πŸ”Ή Cybersecurity LLM-CVE Dataset:  
πŸ”— [https://huggingface.co/datasets/Bouquets/Cybersecurity-LLM-CVE](https://huggingface.co/datasets/Bouquets/Cybersecurity-LLM-CVE)  

πŸ”Ή Red Team LLM English Dataset:  
πŸ”— [https://huggingface.co/datasets/Bouquets/Cybersecurity-Red_team-LLM-en](https://huggingface.co/datasets/Bouquets/Cybersecurity-Red_team-LLM-en)  

## 🎯 Core Capabilities Showcase & Comparison (Original model has ethical restrictions; simple comparison with SecGPT-7B model)  
Given the absence of standardized evaluation metrics for cybersecurity penetration testing in large language models, we propose a controlled comparative framework leveraging GPT-4 as an impartial evaluator. The methodology consists of three phases:  
**Controlled Questioning**  
Identical cybersecurity penetration testing questions (e.g., "Explain how to exploit a SQL injection vulnerability in a REST API") are posed to both the distilled strikeGPT model and SecGPT Figure 12. 
![image/png](https://cdn-uploads.huggingface.co/production/uploads/67c1bfdf3e9af7d134c4189d/gYY1KKLLNGeQmUi4BgZJ4.png)
Questions span:  
     Technical Depth (e.g., payload construction)  
     Attack Methodology (e.g., step-by-step exploitation)  
     Mitigation Strategies (e.g., parameterized queries)  
**GPT-4 Evaluation Protocol**  
- Responses from both models are anonymized and evaluated by GPT-4 using criteria:  
- Technical Accuracy (0-5): Alignment with known penetration testing principles (e.g., OWASP guidelines).  
- Logical Coherence (0-5): Consistency in reasoning (e.g., cause-effect relationships in attack chains).  
- Practical Feasibility (0-5): Real-world applicability (e.g., compatibility with tools like Burp Suite).  
- GPT-4 provides detailed justifications for scores
According to the standards, the evaluation results are finally presented in Figure 13.
![image/png](https://cdn-uploads.huggingface.co/production/uploads/67c1bfdf3e9af7d134c4189d/2ThExwlCX4iU_n-Adh6Fp.png)

## πŸ“ˆ Experimental Data Trends  
Minor gradient explosions observed, but overall stable.  
![image](https://github.com/user-attachments/assets/a3fa3676-9f07-47ea-9029-ec0d56fdc989)  

## πŸ’° Training Costs  
- **DeepSeek-R1 API Calls**: Β₯450 (purchased during discounts; normal price ~Β₯1800)  
- **Server Costs**: Β₯4?0  
- **Digital Resources**: Β₯??  
![image](https://github.com/user-attachments/assets/8e23b5b6-24d9-47c3-b54f-ffa22ec68a83)  

## βš–οΈ Usage Notice  
> This model is strictly for **legal security research** and **educational purposes**. Users must comply with local laws and regulations. Developers are not responsible for misuse.  
> **Note**: By using this model, you agree to this disclaimer.  

πŸ’‘ **Tip**: The model may exhibit hallucinations or knowledge gaps. Always cross-verify critical scenarios!