| |
|
| | --- |
| | language: en |
| | tags: |
| | - vulnerability-detection |
| | - code-analysis |
| | - autoencoder |
| | - anomaly-detection |
| | library_name: pytorch |
| | metrics: |
| | - mse |
| | --- |
| | |
| | # CATastrophe - Code Vulnerability Detector |
| |
|
| | This model is an autoencoder-based vulnerability detector for Python code. It uses TF-IDF |
| | vectorization and an autoencoder architecture to detect anomalies in code that may indicate |
| | vulnerabilities. |
| |
|
| | ## Model Details |
| |
|
| | - **Architecture**: Autoencoder (Input → 512 → 128 → 512 → Input) |
| | - **Input Features**: 2000 (TF-IDF) |
| | - **Training Loss**: 0.0005 |
| | - **Framework**: PyTorch |
| |
|
| | ## Usage |
| |
|
| | ```python |
| | import torch |
| | import pickle |
| | from model import Autoencoder |
| | |
| | # Load model |
| | model = Autoencoder(input_dim=2000) |
| | model.load_state_dict(torch.load('catastrophe_model.pth')) |
| | model.eval() |
| | |
| | # Load vectorizer |
| | with open('vectorizer.pkl', 'rb') as f: |
| | vectorizer = pickle.load(f) |
| | |
| | # Analyze code |
| | code_text = "your code here" |
| | features = vectorizer.transform([code_text]).toarray() |
| | features_tensor = torch.tensor(features, dtype=torch.float32) |
| | |
| | with torch.no_grad(): |
| | reconstructed = model(features_tensor) |
| | anomaly_score = torch.mean((features_tensor - reconstructed) ** 2, dim=1) |
| | ``` |
| |
|
| | ## Training Configuration |
| |
|
| | - Batch Size: 256 |
| | - Epochs: 50 |
| | - Learning Rate: 0.001 |
| | - Optimizer: Adam |
| |
|
| | ## Limitations |
| |
|
| | This model is trained on vulnerable commits only and uses reconstruction error as an |
| | anomaly score. High scores indicate potential vulnerabilities, but manual review is |
| | recommended. |
| |
|