phishing-email-detector-capstone / README.md

TestingCapstone

Update README.md

45ba90f verified 3 months ago

preview code

raw

history blame contribute delete

5.19 kB

metadata

pipeline_tag: text-classification
license: apache-2.0
base_model: bert-large-uncased
tags:
  - generated_from_trainer
  - phishing
  - BERT
  - cybersecurity
  - text-classification
metrics:
  - accuracy
  - precision
  - recall
model-index:
  - name: phishing-email-detector-capstone
    results: []
widget:
  - text: https://www.verif22.com
    example_title: Phishing URL
  - text: >
      Dear colleague, An important update about your email has exceeded your
      storage limit. You will not be able to send or receive messages until you
      reactivate your account. We will close all older versions of our Mailbox
      as of Friday, June 12, 2023. To activate and complete the required
      information, click here (https://ec-ec.squarespace.com). Your account must
      be reactivated today to regenerate new space. — Management Team
    example_title: Phishing Email
  - text: >
      You have access to FREE Video Streaming in your plan. REGISTER with your
      email and password, then select the monthly subscription option.
      https://bit.ly/3vNrU5r
    example_title: Phishing SMS
  - text: >
      if(data.selectedIndex > 0){$('#hidCflag').val(data.selectedData.value);};
      var sprypassword1 = new Spry.Widget.ValidationPassword("sprypassword1");
      var sprytextfield1 = new Spry.Widget.ValidationTextField("sprypassword1",
      "email");
    example_title: Phishing Script
  - text: Hi, this model is really accurate :)
    example_title: Benign Message
language:
  - en

🧠 Phishing Detection Model (BERT-Large-Uncased)

A transformer-based model fine-tuned to detect phishing content across multiple formats — including emails, URLs, SMS messages, and scripts.
Built on BERT-Large-Uncased, it leverages deep contextual understanding of language to classify text as phishing or benign with high accuracy.

📌 Model Details

Base model: bert-large-uncased
Architecture: 24 layers • 1024 hidden size • 16 attention heads • ~336M parameters
License: Apache 2.0
Language: English
Pipeline tag: text-classification

🧩 Model Description

This model was trained to identify phishing-related content by analyzing linguistic and structural patterns commonly found in malicious communications.
By leveraging BERT’s bidirectional transformer architecture, it effectively detects phishing attempts even when the message appears legitimate or well-written.

Key Features

Detects phishing attempts in text, emails, URLs, and scripts
Useful for cybersecurity applications, such as email gateways or web filtering systems
Capable of identifying varied phishing tactics (impersonation, link manipulation, credential harvesting, etc.)

🎯 Intended Uses

Recommended use cases:

Classify messages, emails, and URLs as phishing or benign
Integrate into automated security pipelines, email filtering tools, or chat moderation systems
Aid in phishing research or awareness programs

Limitations:

May trigger false positives on legitimate content with financial or urgent language
Optimized for English text only
Should be part of a multi-layered defense strategy, not a standalone cybersecurity control

📊 Evaluation Results

Metric	Score
Loss	0.1953
Accuracy	0.9717
Precision	0.9658
Recall	0.9670
False Positive Rate	0.0249

⚙️ Training Details

Hyperparameters

Parameter	Value
Learning rate	2e-05
Train batch size	16
Eval batch size	16
Seed	42
Optimizer	Adam (β₁=0.9, β₂=0.999, ε=1e-08)
LR scheduler	Linear
Epochs	4

Training Results

Training Loss	Epoch	Step	Validation Loss	Accuracy	Precision	Recall	False Positive Rate
0.1487	1.0	3866	0.1454	0.9596	0.9709	0.9320	0.0203
0.0805	2.0	7732	0.1389	0.9691	0.9663	0.9601	0.0243
0.0389	3.0	11598	0.1779	0.9683	0.9778	0.9461	0.0156
0.0091	4.0	15464	0.1953	0.9717	0.9658	0.9670	0.0249

🧠 Example Inference

Try the model in Python using the transformers library:

from transformers import pipeline
# Load the phishing detection model
classifier = pipeline("text-classification", model="your-username/phishing-email-detector-capstone")
# Example texts
examples = [
    "Dear colleague, your email storage is full. Click here to verify your account: https://secure-update-login.com",
    "Hi team, the meeting starts at 2 PM today.",
    "You have won a free gift card! Claim now at http://bit.ly/3xYzabc"
]
# Run inference
for text in examples:
    result = classifier(text)[0]
    print(f"Text: {text}\nPrediction: {result['label']} (score: {result['score']:.4f})\n")