File size: 4,166 Bytes
512d858 399151f | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 | ---
license: apache-2.0
base_model: deepseek-ai/deepseek-coder-1.3b-instruct
tags:
- security
- vulnerability-detection
- penetration-testing
- code-analysis
- cybersecurity
- lora
- deepseek
library_name: peft
pipeline_tag: text-generation
---
# Pentest Vulnerability Detector
## Model Description
This is a fine-tuned version of DeepSeek-Coder-1.3B-Instruct, specialized for detecting security vulnerabilities in code.
**Base Model:** deepseek-ai/deepseek-coder-1.3b-instruct
**Training Data:** 440 synthetic vulnerability examples
**Training Method:** LoRA (Low-Rank Adaptation) with 4-bit quantization
**Training Platform:** Google Colab (Free T4 GPU)
## Capabilities
The model can detect and analyze:
- SQL Injection
- Cross-Site Scripting (XSS)
- Command Injection / RCE
- Insecure Direct Object Reference (IDOR)
- Server-Side Request Forgery (SSRF)
- Authentication Bypass
- Cross-Site Request Forgery (CSRF)
- Path Traversal
## Training Details
- **Examples:** 440 vulnerability patterns
- **Epochs:** 3
- **Batch Size:** 2 (with gradient accumulation)
- **Learning Rate:** 2e-4
- **LoRA Rank:** 8
- **Quantization:** 4-bit (NF4)
- **Training Time:** ~45-60 minutes on T4 GPU
## Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
# Load base model
base_model = "deepseek-ai/deepseek-coder-1.3b-instruct"
model = AutoModelForCausalLM.from_pretrained(base_model, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(base_model)
# Load LoRA adapter
model = PeftModel.from_pretrained(model, "YOUR_USERNAME/pentest-vulnerability-detector")
# Analyze code
code = "SELECT * FROM users WHERE id = 'user_input'"
prompt = f"System: You are a security expert.\n\nUser: Analyze this code:\n{code}\n\nAssistant:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=200)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
```
## Inference Script
For easier usage, use the provided inference script:
```bash
python inference_deepseek.py --model ./model --code "YOUR_CODE_HERE"
```
## Model Performance
The model provides:
- Vulnerability type identification
- Severity assessment (CRITICAL/HIGH/MEDIUM/LOW)
- Detailed attack vector analysis
- Specific remediation recommendations
- Code-specific security guidance
## Limitations
- Not 100% accurate - always verify findings manually
- May have false positives/negatives
- Best used as a pre-screening tool
- Should complement, not replace, manual security testing
- Trained on synthetic data - may need fine-tuning for specific use cases
## Ethical Use
This model is intended for:
- Security research
- Penetration testing (authorized only)
- Code review and security auditing
- Educational purposes
**Do not use for:**
- Unauthorized system access
- Malicious activities
- Illegal purposes
## Training Data
The model was trained on 440 synthetic vulnerability examples covering:
- 100 SQL Injection patterns
- 80 XSS patterns
- 60 Command Injection patterns
- 50 IDOR patterns
- 40 SSRF patterns
- 40 Authentication Bypass patterns
- 40 CSRF patterns
- 30 Path Traversal patterns
## Citation
If you use this model, please cite:
```
@misc{pentest-vulnerability-detector,
author = {YOUR_NAME},
title = {Pentest Vulnerability Detector},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/YOUR_USERNAME/pentest-vulnerability-detector}}
}
```
## License
This model adapter is released under the **Apache 2.0 License**.
The base model (DeepSeek-Coder-1.3B-Instruct) has its own license terms.
### Apache 2.0 License Summary:
- ✅ Commercial use allowed
- ✅ Modification allowed
- ✅ Distribution allowed
- ✅ Patent use allowed
- ⚠️ Must include license and copyright notice
- ⚠️ Must state changes made
See LICENSE file for full terms.
## Contact
For questions or issues, please open an issue on the model repository.
## Acknowledgments
- Base model: DeepSeek-Coder by DeepSeek AI
- Training framework: Hugging Face Transformers, PEFT
- Training platform: Google Colab
|