Text Classification
Transformers
Safetensors
bert
Generated from Trainer

JellyPhish

This model is a fine-tuned version of google-bert/bert-base-uncased on the zefang-liu/phishing-email-dataset and kxm1k4m1/generate_phishing_email_final datasets. It achieves the following results on the evaluation set:

  • Loss: 0.1658
  • Accuracy: 0.9365
  • Macro F1: 0.9364
  • Weighted F1: 0.9364
  • Precision: 0.9366
  • Recall: 0.9363

Intended uses

  • Detecting phishing emails in corporate environments.
  • Classifying raw email text into phishing vs. legitimate.
  • Serving as a baseline for email security NLP tasks.

Limitations

  • Model performance depends on the domain of the training data; may not generalize to unseen types of phishing.
  • Sensitive to noisy/unstructured inputs (e.g., raw HTML emails).
  • Should not be used as the only defense mechanism against phishing — combine with rule-based and security systems.

How to use


from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("RamzyBakir/jellyphish-bert-base-mail")
model = AutoModelForSequenceClassification.from_pretrained("RamzyBakir/jellyphish-bert-base-mail)

inputs = tokenizer("Your email text here", return_tensors="pt")
outputs = model(**inputs)
predicted_class = outputs.logits.argmax(-1).item()

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 3e-05
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 10

Training results

Training Loss Epoch Step Validation Loss Accuracy Macro F1 Weighted F1 Precision Recall
0.5127 1.0 238 0.3615 0.8887 0.8885 0.8886 0.8893 0.8883
0.3214 2.0 476 0.2437 0.9157 0.9157 0.9157 0.9156 0.9158
0.2497 3.0 714 0.2136 0.9193 0.9192 0.9193 0.9202 0.9189
0.2257 4.0 952 0.1937 0.9253 0.9252 0.9252 0.9258 0.9249
0.2076 5.0 1190 0.1766 0.9309 0.9308 0.9309 0.9308 0.9309
0.201 6.0 1428 0.1751 0.9322 0.9321 0.9321 0.9325 0.9319
0.1959 7.0 1666 0.1714 0.9361 0.9361 0.9361 0.9364 0.9359
0.1944 8.0 1904 0.1676 0.9355 0.9354 0.9355 0.9356 0.9353
0.1904 9.0 2142 0.1648 0.9368 0.9367 0.9368 0.9368 0.9367
0.1912 10.0 2380 0.1658 0.9365 0.9364 0.9364 0.9366 0.9363

Framework versions

  • Transformers 4.52.4
  • Pytorch 2.6.0+cu124
  • Datasets 3.6.0
  • Tokenizers 0.21.2
Downloads last month
10
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for RamzyBakir/jellyphish-bert-base-mail

Finetuned
(6285)
this model
Quantizations
1 model

Datasets used to train RamzyBakir/jellyphish-bert-base-mail

Collection including RamzyBakir/jellyphish-bert-base-mail

Evaluation results