| | --- |
| | license: unknown |
| | --- |
| | |
| | # Overview |
| |
|
| | <!-- This model is obtained by finetuning Pre-Trained RoBERTa on dataset containing several sets of malicious prompts. |
| | Using this model, we can classify malicious prompts that can lead towards creation of phishing websites and phishing emails. |
| | This model is obtained by finetuning a Pre-Trained RoBERTa using a dataset encompassing multiple sets of malicious prompts, as detailed in the corresponding arXiv paper. |
| | Using this model, we can classify malicious prompts that can lead towards creation of phishing websites and phishing emails. --> |
| |
|
| | Our model, "Is it Phish?" is designed to identify malicious prompts that can be used to generate phishing websites and emails using popular commercial LLMs like ChatGPT, Bard and Claude. |
| | This model is obtained by finetuning a Pre-Trained RoBERTa using a dataset encompassing multiple sets of malicious prompts. |
| |
|
| | Try out "Is it Phish?" using the Inference API. Our model classifies prompts with "Label 1" to signify the identification of a phishing attempt, while "Label 0" denotes a prompt that is considered safe and non-malicious. |
| |
|
| | ## Dataset Details |
| |
|
| | The dataset utilized for training this model has been created using malicious prompts generated by GPT-4. |
| | Due to ethical concerns, our dataset is currently available only upon request. |
| |
|
| | ## Training Details |
| |
|
| | The model was trained using RobertaForSequenceClassification.from_pretrained. |
| | In this process, both the model and tokenizer pertinent to the RoBERTa-base were employed. |
| | We trained this model for 10 epochs, setting a learning rate to 2e-5, and used AdamW Optimizer. |
| | |
| | ## Inference |
| | |
| | There are multiple ways to use this model. The simplest way to use is with pipeline "text-classification" |
| | |
| | ```python |
| | from transformers import pipeline |
| | classifier = pipeline(task="text-classification", model="phishbot/Isitphish", top_k=None) |
| | prompt = ["Your Sample Sentence or Prompt...."] |
| | model_outputs = classifier(prompt) |
| | print(model_outputs[0]) |
| | ``` |
| | |
| | ### Results |
| | |
| | Achieved an accuracy of 96% with an F1-score of 0.96, on different test sets distribution. |
| | |