File size: 4,166 Bytes

---
license: apache-2.0
base_model: deepseek-ai/deepseek-coder-1.3b-instruct
tags:
- security
- vulnerability-detection
- penetration-testing
- code-analysis
- cybersecurity
- lora
- deepseek
library_name: peft
pipeline_tag: text-generation
---

# Pentest Vulnerability Detector

## Model Description

This is a fine-tuned version of DeepSeek-Coder-1.3B-Instruct, specialized for detecting security vulnerabilities in code.

**Base Model:** deepseek-ai/deepseek-coder-1.3b-instruct  
**Training Data:** 440 synthetic vulnerability examples  
**Training Method:** LoRA (Low-Rank Adaptation) with 4-bit quantization  
**Training Platform:** Google Colab (Free T4 GPU)

## Capabilities

The model can detect and analyze:
- SQL Injection
- Cross-Site Scripting (XSS)
- Command Injection / RCE
- Insecure Direct Object Reference (IDOR)
- Server-Side Request Forgery (SSRF)
- Authentication Bypass
- Cross-Site Request Forgery (CSRF)
- Path Traversal

## Training Details

- **Examples:** 440 vulnerability patterns
- **Epochs:** 3
- **Batch Size:** 2 (with gradient accumulation)
- **Learning Rate:** 2e-4
- **LoRA Rank:** 8
- **Quantization:** 4-bit (NF4)
- **Training Time:** ~45-60 minutes on T4 GPU

## Usage

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# Load base model
base_model = "deepseek-ai/deepseek-coder-1.3b-instruct"
model = AutoModelForCausalLM.from_pretrained(base_model, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(base_model)

# Load LoRA adapter
model = PeftModel.from_pretrained(model, "YOUR_USERNAME/pentest-vulnerability-detector")

# Analyze code
code = "SELECT * FROM users WHERE id = 'user_input'"
prompt = f"System: You are a security expert.\n\nUser: Analyze this code:\n{code}\n\nAssistant:"

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=200)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
```

## Inference Script

For easier usage, use the provided inference script:

```bash
python inference_deepseek.py --model ./model --code "YOUR_CODE_HERE"
```

## Model Performance

The model provides:
- Vulnerability type identification
- Severity assessment (CRITICAL/HIGH/MEDIUM/LOW)
- Detailed attack vector analysis
- Specific remediation recommendations
- Code-specific security guidance

## Limitations

- Not 100% accurate - always verify findings manually
- May have false positives/negatives
- Best used as a pre-screening tool
- Should complement, not replace, manual security testing
- Trained on synthetic data - may need fine-tuning for specific use cases

## Ethical Use

This model is intended for:
- Security research
- Penetration testing (authorized only)
- Code review and security auditing
- Educational purposes

**Do not use for:**
- Unauthorized system access
- Malicious activities
- Illegal purposes

## Training Data

The model was trained on 440 synthetic vulnerability examples covering:
- 100 SQL Injection patterns
- 80 XSS patterns
- 60 Command Injection patterns
- 50 IDOR patterns
- 40 SSRF patterns
- 40 Authentication Bypass patterns
- 40 CSRF patterns
- 30 Path Traversal patterns

## Citation

If you use this model, please cite:

```
@misc{pentest-vulnerability-detector,
  author = {YOUR_NAME},
  title = {Pentest Vulnerability Detector},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/YOUR_USERNAME/pentest-vulnerability-detector}}
}
```

## License

This model adapter is released under the **Apache 2.0 License**.

The base model (DeepSeek-Coder-1.3B-Instruct) has its own license terms.

### Apache 2.0 License Summary:
- ✅ Commercial use allowed
- ✅ Modification allowed
- ✅ Distribution allowed
- ✅ Patent use allowed
- ⚠️ Must include license and copyright notice
- ⚠️ Must state changes made

See LICENSE file for full terms.

## Contact

For questions or issues, please open an issue on the model repository.

## Acknowledgments

- Base model: DeepSeek-Coder by DeepSeek AI
- Training framework: Hugging Face Transformers, PEFT
- Training platform: Google Colab