| --- |
| pipeline_tag: text-classification |
| license: apache-2.0 |
| base_model: bert-large-uncased |
| tags: |
| - generated_from_trainer |
| - phishing |
| - BERT |
| - cybersecurity |
| - text-classification |
| metrics: |
| - accuracy |
| - precision |
| - recall |
| model-index: |
| - name: phishing-email-detector-capstone |
| results: [] |
| widget: |
| - text: https://www.verif22.com |
| example_title: Phishing URL |
| - text: > |
| Dear colleague, |
| An important update about your email has exceeded your storage limit. |
| You will not be able to send or receive messages until you reactivate your account. |
| We will close all older versions of our Mailbox as of Friday, June 12, 2023. |
| To activate and complete the required information, click here (https://ec-ec.squarespace.com). |
| Your account must be reactivated today to regenerate new space. |
| — Management Team |
| example_title: Phishing Email |
| - text: > |
| You have access to FREE Video Streaming in your plan. |
| REGISTER with your email and password, then select the monthly subscription option. |
| https://bit.ly/3vNrU5r |
| example_title: Phishing SMS |
| - text: > |
| if(data.selectedIndex > 0){$('#hidCflag').val(data.selectedData.value);}; |
| var sprypassword1 = new Spry.Widget.ValidationPassword("sprypassword1"); |
| var sprytextfield1 = new Spry.Widget.ValidationTextField("sprypassword1", "email"); |
| example_title: Phishing Script |
| - text: Hi, this model is really accurate :) |
| example_title: Benign Message |
| language: |
| - en |
| --- |
| # 🧠 Phishing Detection Model (BERT-Large-Uncased) |
|
|
| A transformer-based model fine-tuned to detect **phishing content** across multiple formats — including **emails, URLs, SMS messages, and scripts**. |
| Built on **BERT-Large-Uncased**, it leverages deep contextual understanding of language to classify text as *phishing* or *benign* with high accuracy. |
|
|
| --- |
|
|
| ## 📌 Model Details |
|
|
| **Base model:** `bert-large-uncased` |
| **Architecture:** 24 layers • 1024 hidden size • 16 attention heads • ~336M parameters |
| **License:** Apache 2.0 |
| **Language:** English |
| **Pipeline tag:** `text-classification` |
|
|
| --- |
|
|
| ## 🧩 Model Description |
|
|
| This model was trained to identify phishing-related content by analyzing linguistic and structural patterns commonly found in malicious communications. |
| By leveraging BERT’s bidirectional transformer architecture, it effectively detects phishing attempts even when the message appears legitimate or well-written. |
|
|
| ### Key Features |
| - Detects **phishing attempts** in text, emails, URLs, and scripts |
| - Useful for **cybersecurity applications**, such as email gateways or web filtering systems |
| - Capable of identifying **varied phishing tactics** (impersonation, link manipulation, credential harvesting, etc.) |
|
|
| --- |
|
|
| ## 🎯 Intended Uses |
|
|
| **Recommended use cases:** |
| - Classify messages, emails, and URLs as *phishing* or *benign* |
| - Integrate into automated **security pipelines**, email filtering tools, or chat moderation systems |
| - Aid in **phishing research** or awareness programs |
|
|
| **Limitations:** |
| - May trigger **false positives** on legitimate content with financial or urgent language |
| - Optimized for **English text** only |
| - Should be part of a **multi-layered defense strategy**, not a standalone cybersecurity control |
|
|
| --- |
|
|
| ## 📊 Evaluation Results |
|
|
| | Metric | Score | |
| |--------|--------| |
| | **Loss** | 0.1953 | |
| | **Accuracy** | 0.9717 | |
| | **Precision** | 0.9658 | |
| | **Recall** | 0.9670 | |
| | **False Positive Rate** | 0.0249 | |
|
|
| --- |
|
|
| ## ⚙️ Training Details |
|
|
| ### Hyperparameters |
| | Parameter | Value | |
| |------------|--------| |
| | **Learning rate** | 2e-05 | |
| | **Train batch size** | 16 | |
| | **Eval batch size** | 16 | |
| | **Seed** | 42 | |
| | **Optimizer** | Adam (β₁=0.9, β₂=0.999, ε=1e-08) | |
| | **LR scheduler** | Linear | |
| | **Epochs** | 4 | |
|
|
| ### Training Results |
|
|
| | Training Loss | Epoch | Step | Validation Loss | Accuracy | Precision | Recall | False Positive Rate | |
| |:-------------:|:-----:|:-----:|:---------------:|:--------:|:---------:|:------:|:-------------------:| |
| | 0.1487 | 1.0 | 3866 | 0.1454 | 0.9596 | 0.9709 | 0.9320 | 0.0203 | |
| | 0.0805 | 2.0 | 7732 | 0.1389 | 0.9691 | 0.9663 | 0.9601 | 0.0243 | |
| | 0.0389 | 3.0 | 11598 | 0.1779 | 0.9683 | 0.9778 | 0.9461 | 0.0156 | |
| | 0.0091 | 4.0 | 15464 | 0.1953 | 0.9717 | 0.9658 | 0.9670 | 0.0249 | |
|
|
| --- |
|
|
| ## 🧠 Example Inference |
|
|
| Try the model in Python using the `transformers` library: |
|
|
| ```python |
| from transformers import pipeline |
| # Load the phishing detection model |
| classifier = pipeline("text-classification", model="your-username/phishing-email-detector-capstone") |
| # Example texts |
| examples = [ |
| "Dear colleague, your email storage is full. Click here to verify your account: https://secure-update-login.com", |
| "Hi team, the meeting starts at 2 PM today.", |
| "You have won a free gift card! Claim now at http://bit.ly/3xYzabc" |
| ] |
| # Run inference |
| for text in examples: |
| result = classifier(text)[0] |
| print(f"Text: {text}\nPrediction: {result['label']} (score: {result['score']:.4f})\n") |
| |