Spamforensics / README.md
haroon103's picture
Rename README.md.md to README.md
d7b03f5 verified
metadata
title: FYP4 Spam Detection API
emoji: 📧
colorFrom: blue
colorTo: red
sdk: docker
pinned: false
license: mit

FYP4 Spam Detection API

A powerful email spam detection system using state-of-the-art transformer models (DeBERTa-v3 for text and ViT for images) with multimodal fusion capabilities.

Features

  • Text-based Detection: Uses Microsoft's DeBERTa-v3-base model for analyzing email text
  • Image-based Detection: Uses Google's ViT model for analyzing embedded images
  • Multimodal Fusion: Combines text and image features using cross-modal attention
  • PDF Email Support: Extracts and analyzes content from PDF email files
  • RESTful API: Easy-to-use FastAPI endpoints

API Endpoints

1. Health Check

GET /health

Returns the status of the API and loaded models.

2. Text Prediction

POST /predict/text
Content-Type: application/json

{
  "text": "Your email text here"
}

3. PDF Prediction

POST /predict/pdf
Content-Type: multipart/form-data

file: <PDF file>

Usage Examples

Python

import requests

# Text prediction
response = requests.post(
    "https://YOUR-SPACE-URL/predict/text",
    json={"text": "Congratulations! You've won $1,000,000!"}
)
print(response.json())

# PDF prediction
with open("email.pdf", "rb") as f:
    response = requests.post(
        "https://YOUR-SPACE-URL/predict/pdf",
        files={"file": f}
    )
print(response.json())

cURL

# Text prediction
curl -X POST "https://YOUR-SPACE-URL/predict/text" \
  -H "Content-Type: application/json" \
  -d '{"text": "Your email text"}'

# PDF prediction
curl -X POST "https://YOUR-SPACE-URL/predict/pdf" \
  -F "file=@email.pdf"

JavaScript

// Text prediction
const response = await fetch('https://YOUR-SPACE-URL/predict/text', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({ text: 'Your email text' })
});
const data = await response.json();
console.log(data);

// PDF prediction
const formData = new FormData();
formData.append('file', pdfFile);
const response = await fetch('https://YOUR-SPACE-URL/predict/pdf', {
  method: 'POST',
  body: formData
});
const data = await response.json();
console.log(data);

Response Format

Text Prediction Response

{
  "prediction": "SPAM",
  "confidence": 95.67,
  "spam_probability": 95.67,
  "ham_probability": 4.33,
  "model_used": "text"
}

PDF Prediction Response

{
  "email_data": {
    "subject": "Email subject",
    "sender": "sender@example.com",
    "body": "Email body content...",
    "full_text": "Complete email text..."
  },
  "text_result": {
    "prediction": "SPAM",
    "confidence": 94.5,
    "spam_probability": 94.5,
    "ham_probability": 5.5
  },
  "image_result": {
    "prediction": "SPAM",
    "confidence": 92.3,
    "spam_probability": 92.3,
    "ham_probability": 7.7
  },
  "fusion_result": {
    "prediction": "SPAM",
    "confidence": 96.8,
    "spam_probability": 96.8,
    "ham_probability": 3.2
  },
  "final_prediction": "SPAM",
  "final_confidence": 96.8
}

Model Architecture

Text Model (DeBERTa-v3-base)

  • Pre-trained Microsoft DeBERTa-v3-base
  • Custom projection layer to 512-dimensional fusion space
  • Multi-layer classifier with LayerNorm and GELU activation

Image Model (ViT-base)

  • Pre-trained Google ViT-base-patch16-224
  • Custom projection layer to 512-dimensional fusion space
  • Multi-layer classifier with LayerNorm and GELU activation

Fusion Model

  • Combines text and image encoders
  • Cross-modal attention mechanism for feature fusion
  • Joint classification head for final prediction

Setup Instructions

  1. Prepare your trained models: Place your .pth model files in the models/ directory:

    • models/text_model.pth
    • models/image_model.pth
    • models/fusion_model.pth
  2. Deploy to Hugging Face Spaces:

    • Create a new Space on Hugging Face
    • Select Docker as the SDK
    • Upload all files from this repository
    • The API will automatically start on port 7860

Local Development

# Install dependencies
pip install -r requirements.txt

# Run the API
python app.py

The API will be available at http://localhost:7860

Requirements

  • Python 3.10+
  • PyTorch 2.1.0+
  • Transformers 4.35.2+
  • FastAPI 0.104.1+
  • See requirements.txt for complete list

Model Files

⚠️ Important: This repository does not include the trained model weights. You need to:

  1. Train the models using the training script
  2. Save the model checkpoints (.pth files)
  3. Upload them to the models/ directory in your Hugging Face Space

Performance

The models are optimized for:

  • Accuracy: High precision in spam detection
  • Speed: Fast inference on CPU/GPU
  • Multimodal: Leverages both text and image features
  • Scalability: Handles concurrent requests efficiently

License

MIT License - See LICENSE file for details

Citation

If you use this API in your research, please cite:

@misc{fyp4_spam_detection,
  title={FYP4 Spam Detection: Multimodal Email Spam Classification},
  author={Your Name},
  year={2024},
  howpublished={\url{https://huggingface.co/spaces/YOUR-USERNAME/fyp4-spam-detection}}
}

Contact

For questions or issues, please open an issue on GitHub or contact the author.


Built with ❤️ using PyTorch, Transformers, and FastAPI