Adnan-AI-Labs
/

URLShield-DistilBERT

Model card Files Files and versions

adnanaman commited on Nov 5, 2024

Commit

b7b3fc8

·

verified ·

1 Parent(s): 5ed1495

Update README.md

Files changed (1) hide show

README.md +51 -14

README.md CHANGED Viewed

@@ -1,15 +1,52 @@
 ---
-license: apache-2.0
-datasets:
-- Adnan-AI-Labs/CleanedBalancedPhishingUrls
-language:
-- en
-metrics:
-- accuracy
-base_model:
-- distilbert/distilbert-base-uncased
-tags:
-- phishing
-- phishing_url
-- classification
----

+# Model Card for DistilBERT-PhishGuard
+## Model Overview
+**DistilBERT-PhishGuard** is a phishing URL detection model based on DistilBERT, fine-tuned specifically for the task of identifying whether a URL is safe or phishing. This model is designed for real-time applications in web and email security, helping users identify malicious links.
 ---
+## Intended Use
+- **Use Cases**: URL classification for phishing detection in emails, websites, and chat applications.
+- **Limitations**: This model may have reduced accuracy with non-English URLs or heavily obfuscated links.
+- **Intended Users**: Security researchers, application developers, and cybersecurity engineers.
+---
+## Model Details
+- **Architecture**: DistilBERT for Sequence Classification
+- **Language**: Primarily English
+- **License**: Apache License 2.0
+- **Dataset**: Trained on labeled phishing and safe URLs from public and proprietary sources.
+---
+## Usage
+This model can be loaded and used with Hugging Face's `transformers` library:
+```python
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+import torch
+# Load the model and tokenizer
+tokenizer = AutoTokenizer.from_pretrained("your-username/DistilBERT-PhishGuard")
+model = AutoModelForSequenceClassification.from_pretrained("your-username/DistilBERT-PhishGuard")
+# Sample URL for classification
+url = "http://example.com"
+inputs = tokenizer(url, return_tensors="pt", truncation=True, max_length=256)
+outputs = model(**inputs)
+predictions = torch.argmax(outputs.logits, dim=-1)
+print("Prediction:", "Phishing" if predictions.item() == 1 else "Safe")
+## Performance
+The model achieves high accuracy across different chunks of training data, with performance metrics above 98% accuracy and an AUC close to or at 1.00 in later stages. This indicates robust and reliable phishing detection across varied datasets.
+## Limitations and Biases
+The model's performance may degrade on URLs containing obfuscated or novel phishing techniques.
+It may be less effective on non-English URLs and may need further fine-tuning for different languages or domain-specific URLs.
+### Contact and Support
+For questions, improvements, or support, please contact us through the Hugging Face community or open an issue in the model repository.