kmack/Phishing_urls
Viewer • Updated • 709k • 236 • 3
This is a custom character-level Transformer model for URL phishing classification.
This model is based on char-transformer-url-v3 and has been fine-tuned for phishing detection tasks.
This repository uses a character-based URL model, not a token/subword transformer.
., /, ?, =, %, @, homoglyph-like variants).Training Material/URLs.url, website_url, link, label, status, etc.).0=safe, 1=phishing.https://).vocab_size=100, embed_dim=192max_length=512num_layers=6, num_heads=8, feedforward hidden_dim=384dropout=0.1) -> 2 logits2488e-050.010.11best_model.pt: best checkpoint by validation ROC-AUCmodel.pt: final model checkpointmodel_config.json: architecture hyperparameterstokenizer.json: character vocabulary + tokenizer metadatatraining_info.json: train/val/test metrics and key run parameterspython train_url_classifier_char.py \
--output_dir ./Models/url_classifier_char_v3 \
--epochs 2 \
--batch_size 48 \
--lr 8e-05 \
--max_length 512 \
--embed_dim 192 \
--num_heads 8 \
--num_layers 6 \
--hidden_dim 384 \
--dropout 0.1
import json
import torch
# This repository contains a custom PyTorch model:
# - model.pt (trained weights)
# - model_config.json (architecture hyperparameters)
# - tokenizer.json (character tokenizer)
#
# Load these files with your project inference code (e.g. predict_url_char.py).
with open("model_config.json", "r", encoding="utf-8") as f:
config = json.load(f)
state_dict = torch.load("model.pt", map_location="cpu")
print("Loaded custom character-level URL classifier.")
print(config)
This model was trained on specific datasets and may not generalize to all types of phishing attempts. Always use additional security measures in production environments.
If you use this model, please cite:
@misc{nhellyercreek_url_phishing_classifier_char_v3,
title={Url Phishing Classifier Char V3},
author={Noah Hellyer},
year={2026},
publisher={Hugging Face},
howpublished={\url{https://huggingface.co/nhellyercreek/url-phishing-classifier-char-v3}}
}