File size: 4,166 Bytes
512d858
 
 
 
 
 
 
 
 
 
 
 
 
 
 
399151f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
---
license: apache-2.0
base_model: deepseek-ai/deepseek-coder-1.3b-instruct
tags:
- security
- vulnerability-detection
- penetration-testing
- code-analysis
- cybersecurity
- lora
- deepseek
library_name: peft
pipeline_tag: text-generation
---

# Pentest Vulnerability Detector

## Model Description

This is a fine-tuned version of DeepSeek-Coder-1.3B-Instruct, specialized for detecting security vulnerabilities in code.

**Base Model:** deepseek-ai/deepseek-coder-1.3b-instruct  
**Training Data:** 440 synthetic vulnerability examples  
**Training Method:** LoRA (Low-Rank Adaptation) with 4-bit quantization  
**Training Platform:** Google Colab (Free T4 GPU)

## Capabilities

The model can detect and analyze:
- SQL Injection
- Cross-Site Scripting (XSS)
- Command Injection / RCE
- Insecure Direct Object Reference (IDOR)
- Server-Side Request Forgery (SSRF)
- Authentication Bypass
- Cross-Site Request Forgery (CSRF)
- Path Traversal

## Training Details

- **Examples:** 440 vulnerability patterns
- **Epochs:** 3
- **Batch Size:** 2 (with gradient accumulation)
- **Learning Rate:** 2e-4
- **LoRA Rank:** 8
- **Quantization:** 4-bit (NF4)
- **Training Time:** ~45-60 minutes on T4 GPU

## Usage

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# Load base model
base_model = "deepseek-ai/deepseek-coder-1.3b-instruct"
model = AutoModelForCausalLM.from_pretrained(base_model, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(base_model)

# Load LoRA adapter
model = PeftModel.from_pretrained(model, "YOUR_USERNAME/pentest-vulnerability-detector")

# Analyze code
code = "SELECT * FROM users WHERE id = 'user_input'"
prompt = f"System: You are a security expert.\n\nUser: Analyze this code:\n{code}\n\nAssistant:"

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=200)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
```

## Inference Script

For easier usage, use the provided inference script:

```bash
python inference_deepseek.py --model ./model --code "YOUR_CODE_HERE"
```

## Model Performance

The model provides:
- Vulnerability type identification
- Severity assessment (CRITICAL/HIGH/MEDIUM/LOW)
- Detailed attack vector analysis
- Specific remediation recommendations
- Code-specific security guidance

## Limitations

- Not 100% accurate - always verify findings manually
- May have false positives/negatives
- Best used as a pre-screening tool
- Should complement, not replace, manual security testing
- Trained on synthetic data - may need fine-tuning for specific use cases

## Ethical Use

This model is intended for:
- Security research
- Penetration testing (authorized only)
- Code review and security auditing
- Educational purposes

**Do not use for:**
- Unauthorized system access
- Malicious activities
- Illegal purposes

## Training Data

The model was trained on 440 synthetic vulnerability examples covering:
- 100 SQL Injection patterns
- 80 XSS patterns
- 60 Command Injection patterns
- 50 IDOR patterns
- 40 SSRF patterns
- 40 Authentication Bypass patterns
- 40 CSRF patterns
- 30 Path Traversal patterns

## Citation

If you use this model, please cite:

```
@misc{pentest-vulnerability-detector,
  author = {YOUR_NAME},
  title = {Pentest Vulnerability Detector},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/YOUR_USERNAME/pentest-vulnerability-detector}}
}
```

## License

This model adapter is released under the **Apache 2.0 License**.

The base model (DeepSeek-Coder-1.3B-Instruct) has its own license terms.

### Apache 2.0 License Summary:
- ✅ Commercial use allowed
- ✅ Modification allowed
- ✅ Distribution allowed
- ✅ Patent use allowed
- ⚠️ Must include license and copyright notice
- ⚠️ Must state changes made

See LICENSE file for full terms.

## Contact

For questions or issues, please open an issue on the model repository.

## Acknowledgments

- Base model: DeepSeek-Coder by DeepSeek AI
- Training framework: Hugging Face Transformers, PEFT
- Training platform: Google Colab