url-phish-fastapi / README.md
Rasel Santillan
Squashed clean history
8a9ac80
metadata
title: Phishing URL Detection API
emoji: πŸ”’
colorFrom: red
colorTo: yellow
sdk: docker
pinned: false
license: mit
app_port: 7860

Phishing URL Detection API

A FastAPI-based REST API for detecting phishing URLs using machine learning. This service analyzes URL features and webpage content to classify URLs as legitimate or phishing attempts.

Features

  • πŸ” Real-time URL Analysis: Extracts 43 features from URLs and their webpages
  • πŸ€– Machine Learning: Uses a stacking ensemble model for accurate predictions
  • πŸš€ Fast API: Built with FastAPI for high performance and automatic documentation
  • 🐳 Docker Support: Containerized for easy deployment
  • πŸ“Š Confidence Scores: Returns prediction confidence for better decision-making
  • πŸ”’ CORS Enabled: Accessible from web browsers

Project Structure

url-phish-fastapi/
β”œβ”€β”€ main.py                          # FastAPI application
β”œβ”€β”€ model/
β”‚   β”œβ”€β”€ __init__.py                  # Package initialization
β”‚   β”œβ”€β”€ model.py                     # Model loading and prediction logic
β”‚   β”œβ”€β”€ url_feature_extractor.py     # Feature extraction from URLs
β”‚   └── url_stacking_model.joblib    # Pre-trained ML model
β”œβ”€β”€ requirements.txt                 # Python dependencies
β”œβ”€β”€ Dockerfile                       # Docker configuration
β”œβ”€β”€ .dockerignore                    # Docker ignore patterns
└── README.md                        # This file

API Endpoints

Health Check

  • GET / - Root endpoint
  • GET /health - Health check endpoint

Prediction

  • POST /predict - Analyze a URL for phishing detection

Request Body:

{
  "url": "http://example.com"
}

Response:

{
  "url": "http://example.com",
  "prediction": "legitimate",
  "confidence": 0.95,
  "predicted_label": 0,
  "phish_probability": 0.05
}

Interactive Documentation

  • Swagger UI: http://localhost:7860/docs
  • ReDoc: http://localhost:7860/redoc

Installation & Usage

Option 1: Local Development

  1. Install dependencies:
pip install -r requirements.txt
  1. Run the application:
python app.py
  1. Access the API:

Option 2: Docker (Recommended)

  1. Build the Docker image:
docker build -t phishing-url-api .
  1. Run the container:
docker run -p 7860:7860 phishing-url-api
  1. Access the API:

Option 3: Docker with Custom Port

docker run -p 8000:8000 -e PORT=8000 phishing-url-api

Testing

Run the test script to verify the API is working:

python test_api.py

Or use curl:

# Health check
curl http://localhost:7860/health

# Predict URL
curl -X POST http://localhost:7860/predict \
  -H "Content-Type: application/json" \
  -d '{"url": "https://www.google.com"}'

Model Information

The API uses a stacking ensemble model that combines multiple base classifiers:

  • Random Forest
  • Gradient Boosting
  • XGBoost
  • LightGBM
  • Logistic Regression (meta-model)

Features Extracted (43 total)

The model analyzes various HTML elements and webpage characteristics:

  • Form elements (inputs, buttons, password fields)
  • Media elements (images, videos, audio)
  • Structural elements (divs, tables, lists)
  • Content metrics (text length, title length)
  • Interactive elements (links, scripts, iframes)

Dependencies

  • FastAPI: Web framework
  • Uvicorn: ASGI server
  • Scikit-learn: Machine learning
  • Pandas/NumPy: Data processing
  • BeautifulSoup4: HTML parsing
  • Requests: HTTP requests
  • XGBoost/LightGBM: Gradient boosting models

Error Handling

The API handles various error scenarios:

  • 400 Bad Request: Invalid or empty URL
  • 500 Internal Server Error: Model loading or prediction failures
  • Unknown Prediction: When URL is unreachable or feature extraction fails

Performance Considerations

  • Model is loaded once on startup (singleton pattern)
  • Feature extraction may take 5-10 seconds for live URLs
  • Unreachable URLs return "unknown" prediction
  • HTTPS verification is disabled for broader compatibility

Security Notes

  • The API makes HTTP requests to analyze URLs
  • SSL verification is disabled for feature extraction
  • Use appropriate network security when deploying
  • Consider rate limiting for production use

Deployment

HuggingFace Spaces

This project is configured for deployment on HuggingFace Spaces using Docker SDK.

Other Platforms

The Docker container can be deployed on:

  • AWS ECS/Fargate
  • Google Cloud Run
  • Azure Container Instances
  • Kubernetes
  • Any Docker-compatible platform

License

[Add your license information here]

Contributing

[Add contribution guidelines here]

Support

For issues and questions, please create an issue.