Update README.md

04777e3 verified 8 months ago

3.8 kB

library_name: transformers
license: apache-2.0
base_model: google-bert/bert-base-uncased
tags:
  - generated_from_trainer
metrics:
  - accuracy
  - precision
  - recall
model-index:
  - name: JellyPhish
    results: []
datasets:
  - zefang-liu/phishing-email-dataset
  - kxm1k4m1/generate_phishing_email_final
pipeline_tag: text-classification

JellyPhish

This model is a fine-tuned version of google-bert/bert-base-uncased on the zefang-liu/phishing-email-dataset and kxm1k4m1/generate_phishing_email_final datasets. It achieves the following results on the evaluation set:

Loss: 0.1658
Accuracy: 0.9365
Macro F1: 0.9364
Weighted F1: 0.9364
Precision: 0.9366
Recall: 0.9363

Intended uses

Detecting phishing emails in corporate environments.
Classifying raw email text into phishing vs. legitimate.
Serving as a baseline for email security NLP tasks.

Limitations

Model performance depends on the domain of the training data; may not generalize to unseen types of phishing.
Sensitive to noisy/unstructured inputs (e.g., raw HTML emails).
Should not be used as the only defense mechanism against phishing — combine with rule-based and security systems.

How to use


from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("RamzyBakir/jellyphish-bert-base-mail")
model = AutoModelForSequenceClassification.from_pretrained("RamzyBakir/jellyphish-bert-base-mail)

inputs = tokenizer("Your email text here", return_tensors="pt")
outputs = model(**inputs)
predicted_class = outputs.logits.argmax(-1).item()

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 3e-05
train_batch_size: 32
eval_batch_size: 32
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 10

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy	Macro F1	Weighted F1	Precision	Recall
0.5127	1.0	238	0.3615	0.8887	0.8885	0.8886	0.8893	0.8883
0.3214	2.0	476	0.2437	0.9157	0.9157	0.9157	0.9156	0.9158
0.2497	3.0	714	0.2136	0.9193	0.9192	0.9193	0.9202	0.9189
0.2257	4.0	952	0.1937	0.9253	0.9252	0.9252	0.9258	0.9249
0.2076	5.0	1190	0.1766	0.9309	0.9308	0.9309	0.9308	0.9309
0.201	6.0	1428	0.1751	0.9322	0.9321	0.9321	0.9325	0.9319
0.1959	7.0	1666	0.1714	0.9361	0.9361	0.9361	0.9364	0.9359
0.1944	8.0	1904	0.1676	0.9355	0.9354	0.9355	0.9356	0.9353
0.1904	9.0	2142	0.1648	0.9368	0.9367	0.9368	0.9368	0.9367
0.1912	10.0	2380	0.1658	0.9365	0.9364	0.9364	0.9366	0.9363

Framework versions

Transformers 4.52.4
Pytorch 2.6.0+cu124
Datasets 3.6.0
Tokenizers 0.21.2