--- pipeline_tag: text-classification license: apache-2.0 base_model: bert-large-uncased tags: - generated_from_trainer - phishing - BERT - cybersecurity - text-classification metrics: - accuracy - precision - recall model-index: - name: phishing-email-detector-capstone results: [] widget: - text: https://www.verif22.com example_title: Phishing URL - text: > Dear colleague, An important update about your email has exceeded your storage limit. You will not be able to send or receive messages until you reactivate your account. We will close all older versions of our Mailbox as of Friday, June 12, 2023. To activate and complete the required information, click here (https://ec-ec.squarespace.com). Your account must be reactivated today to regenerate new space. β€” Management Team example_title: Phishing Email - text: > You have access to FREE Video Streaming in your plan. REGISTER with your email and password, then select the monthly subscription option. https://bit.ly/3vNrU5r example_title: Phishing SMS - text: > if(data.selectedIndex > 0){$('#hidCflag').val(data.selectedData.value);}; var sprypassword1 = new Spry.Widget.ValidationPassword("sprypassword1"); var sprytextfield1 = new Spry.Widget.ValidationTextField("sprypassword1", "email"); example_title: Phishing Script - text: Hi, this model is really accurate :) example_title: Benign Message language: - en --- # 🧠 Phishing Detection Model (BERT-Large-Uncased) A transformer-based model fine-tuned to detect **phishing content** across multiple formats β€” including **emails, URLs, SMS messages, and scripts**. Built on **BERT-Large-Uncased**, it leverages deep contextual understanding of language to classify text as *phishing* or *benign* with high accuracy. --- ## πŸ“Œ Model Details **Base model:** `bert-large-uncased` **Architecture:** 24 layers β€’ 1024 hidden size β€’ 16 attention heads β€’ ~336M parameters **License:** Apache 2.0 **Language:** English **Pipeline tag:** `text-classification` --- ## 🧩 Model Description This model was trained to identify phishing-related content by analyzing linguistic and structural patterns commonly found in malicious communications. By leveraging BERT’s bidirectional transformer architecture, it effectively detects phishing attempts even when the message appears legitimate or well-written. ### Key Features - Detects **phishing attempts** in text, emails, URLs, and scripts - Useful for **cybersecurity applications**, such as email gateways or web filtering systems - Capable of identifying **varied phishing tactics** (impersonation, link manipulation, credential harvesting, etc.) --- ## 🎯 Intended Uses **Recommended use cases:** - Classify messages, emails, and URLs as *phishing* or *benign* - Integrate into automated **security pipelines**, email filtering tools, or chat moderation systems - Aid in **phishing research** or awareness programs **Limitations:** - May trigger **false positives** on legitimate content with financial or urgent language - Optimized for **English text** only - Should be part of a **multi-layered defense strategy**, not a standalone cybersecurity control --- ## πŸ“Š Evaluation Results | Metric | Score | |--------|--------| | **Loss** | 0.1953 | | **Accuracy** | 0.9717 | | **Precision** | 0.9658 | | **Recall** | 0.9670 | | **False Positive Rate** | 0.0249 | --- ## βš™οΈ Training Details ### Hyperparameters | Parameter | Value | |------------|--------| | **Learning rate** | 2e-05 | | **Train batch size** | 16 | | **Eval batch size** | 16 | | **Seed** | 42 | | **Optimizer** | Adam (β₁=0.9, Ξ²β‚‚=0.999, Ξ΅=1e-08) | | **LR scheduler** | Linear | | **Epochs** | 4 | ### Training Results | Training Loss | Epoch | Step | Validation Loss | Accuracy | Precision | Recall | False Positive Rate | |:-------------:|:-----:|:-----:|:---------------:|:--------:|:---------:|:------:|:-------------------:| | 0.1487 | 1.0 | 3866 | 0.1454 | 0.9596 | 0.9709 | 0.9320 | 0.0203 | | 0.0805 | 2.0 | 7732 | 0.1389 | 0.9691 | 0.9663 | 0.9601 | 0.0243 | | 0.0389 | 3.0 | 11598 | 0.1779 | 0.9683 | 0.9778 | 0.9461 | 0.0156 | | 0.0091 | 4.0 | 15464 | 0.1953 | 0.9717 | 0.9658 | 0.9670 | 0.0249 | --- ## 🧠 Example Inference Try the model in Python using the `transformers` library: ```python from transformers import pipeline # Load the phishing detection model classifier = pipeline("text-classification", model="your-username/phishing-email-detector-capstone") # Example texts examples = [ "Dear colleague, your email storage is full. Click here to verify your account: https://secure-update-login.com", "Hi team, the meeting starts at 2 PM today.", "You have won a free gift card! Claim now at http://bit.ly/3xYzabc" ] # Run inference for text in examples: result = classifier(text)[0] print(f"Text: {text}\nPrediction: {result['label']} (score: {result['score']:.4f})\n")