| # BERT Phishing Detection Model | |
| This is a BERT-based model fine-tuned for phishing detection. The model can classify text/URLs as phishing or legitimate. | |
| ## Model Details | |
| - **Model Type**: BERT for Sequence Classification | |
| - **Architecture**: BertForSequenceClassification | |
| - **Problem Type**: Single Label Classification | |
| - **Hidden Size**: 768 | |
| - **Number of Layers**: 12 | |
| - **Number of Attention Heads**: 12 | |
| - **Max Position Embeddings**: 512 | |
| - **Vocabulary Size**: 30,522 | |
| ## Usage | |
| ```python | |
| from transformers import AutoTokenizer, AutoModelForSequenceClassification | |
| import torch | |
| # Load model and tokenizer | |
| model_name = "th1enq/bert_checkpoint" | |
| tokenizer = AutoTokenizer.from_pretrained(model_name) | |
| model = AutoModelForSequenceClassification.from_pretrained(model_name) | |
| # Example usage | |
| text = "Your text here" | |
| inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512) | |
| with torch.no_grad(): | |
| outputs = model(**inputs) | |
| predictions = torch.nn.functional.softmax(outputs.logits, dim=-1) | |
| predicted_class = torch.argmax(predictions, dim=-1) | |
| ``` | |
| ## Training | |
| This model was fine-tuned on phishing detection data to classify text as phishing (1) or legitimate (0). | |
| ## License | |
| Please refer to the original BERT license and any applicable dataset licenses. | |