Phish-Transformer
A lightweight, pure-CPU transformer that detects phishing URLs and warns users inside Chrome.
Version 1.0 β 2025
Custom Commercial Licence β see LICENSE for terms.
Tested on Python 3.10β3.13
Contents
- Installation
- Quick Start
- Pipeline Flow
- API Usage
- Model Specs
- Evaluation
- Chrome Extension
- Packaging
- Possible Extensions
- Contributing
- License
What It Does
- Learns URL patterns from the UCI PhisUSIIL Phishing URL Dataset.
- Tokenizes each URL into 75 ASCII characters.
- Feeds the sequence into a ~ 45k parameter transformer encoder
- Returns a probability via a Flask REST endpoint
/predict. - Displays a red banner in Chrome when the probability > 0.5.
Limitations
The model judges only the URL string (characters, sub-domains, TLD, etc.) and does not fetch or inspect page content.
It is designed as a zero-latency first filter against look-alike domains; content-based attacks require additional page-analysis layers.
Future Improvements
- Integrate content-based page analysis to catch phishing beyond URL strings.
- Support real-time updates from threat feeds to improve detection of new domains.
- Extend multilingual support and handle punycode / internationalized domains.
- Explore edge deployment optimizations (INT8 quantization, 1-D CNN variant).
- Implement user feedback loop for continual model retraining.
Folder Map
phish_transformer/
βββ datasets/ # UCI CSV + splits
βββ models/ # *.pt checkpoints (git-ignored)
βββ src/ # modular code
β βββ data/ # tokenizer + preprocessing
β βββ model/ # transformer definition
β βββ training/ # training loop
β βββ inference/ # Flask API + evaluation
βββ extension/ # Chrome MV3 extension
βββ tests/ # pytest suite
βββ main.py # end-to-end pipeline
βββ requirements.txt # dependencies
βββ README.md # this file
Installation
git clone https://github.com/YOU/phish-transformer.git
cd phish-transformer
python -m venv venv
source venv/bin/activate # Windows : venv\Scripts\activate
pip install -r requirements.txt # includes torch, flask, pandas, etc.
Quick Start
# 1. Train & export
python main.py
# 2. Start API
flask --app src/inference/app run --port 8000
# 3. Load extension
Chrome β Extensions β Developer mode β Load unpacked β select extension/
# 4. Test
pytest tests/
Pipeline Flow
API Usage
POST /predict
Request :
{"url" : "https://example.com"}
Response :
{"phishing" : 0.97}
CLI test
curl -X POST http://127.0.0.1:8000/predict \
-H "Content-Type: application/json" \
-d '{"url" : "https://example.com"}'
GUI testers
- VS Code Thunder Client: install extension β POST β URL above β Body β raw JSON β same payload.
- Postman: new request β POST β URL β Body β raw β JSON β send.
Model Specs
| Layer | Value |
|---|---|
| Architecture | 2-layer Transformer encoder |
| Embedding dim | 32 |
| Heads | 2 |
| Feed-forward | 64 |
| Dropout | 0.1 |
| Sequence | 75 chars |
| Vocab | 96 printable ASCII tokens |
| Params | β 45 k |
| Test AUC | β 0.98 |
Evaluation
python src/inference/evaluate.py # produces roc.png
Chrome Extension
- Start Flask Server.
- Chrome β Extensions β Developer mode β Load unpacked β select
extension/. - Browse any site; red banner appears when score > 0.5.
Packaging for Chrome Web Store
- Zip the
extension/folder (must containmanifest.jsonat top level). - Upload zip in Chrome Developer Dashboard.
- Host the β₯ 1 MB model externally (Google Drive, CDN) and update
fetch()URL incontent.js.
Possible Extensions
- Distil to 1-D CNN for < 200 kb edge deployment
- Quantization (INT8) for 2x speed-up
- On device Chrome ML (TensorFlow-Lite)
- Multilingual URL support (Unicode normalizer)
- Continuous retraining with user feedback
Contributing
- Fork
- Feature branch (
git checkout -b feat/awesome) - Commit with clear messages
- Push & open pull requests
Acknowledgements
- UCI PhisUSIIL Phishing URL Dataset
- PyTorch, Flask, Chrome Extension APIs
Licence
This project is licensed under the GNU General Public Licence v3.0 β see LICENSE for details.
Commercial use is allowed only if you open-source your entire derivative under the same licence.
