Spaces:
Sleeping
Sleeping
metadata
title: FYP4 Spam Detection API
emoji: 📧
colorFrom: blue
colorTo: red
sdk: docker
pinned: false
license: mit
FYP4 Spam Detection API
A powerful email spam detection system using state-of-the-art transformer models (DeBERTa-v3 for text and ViT for images) with multimodal fusion capabilities.
Features
- Text-based Detection: Uses Microsoft's DeBERTa-v3-base model for analyzing email text
- Image-based Detection: Uses Google's ViT model for analyzing embedded images
- Multimodal Fusion: Combines text and image features using cross-modal attention
- PDF Email Support: Extracts and analyzes content from PDF email files
- RESTful API: Easy-to-use FastAPI endpoints
API Endpoints
1. Health Check
GET /health
Returns the status of the API and loaded models.
2. Text Prediction
POST /predict/text
Content-Type: application/json
{
"text": "Your email text here"
}
3. PDF Prediction
POST /predict/pdf
Content-Type: multipart/form-data
file: <PDF file>
Usage Examples
Python
import requests
# Text prediction
response = requests.post(
"https://YOUR-SPACE-URL/predict/text",
json={"text": "Congratulations! You've won $1,000,000!"}
)
print(response.json())
# PDF prediction
with open("email.pdf", "rb") as f:
response = requests.post(
"https://YOUR-SPACE-URL/predict/pdf",
files={"file": f}
)
print(response.json())
cURL
# Text prediction
curl -X POST "https://YOUR-SPACE-URL/predict/text" \
-H "Content-Type: application/json" \
-d '{"text": "Your email text"}'
# PDF prediction
curl -X POST "https://YOUR-SPACE-URL/predict/pdf" \
-F "file=@email.pdf"
JavaScript
// Text prediction
const response = await fetch('https://YOUR-SPACE-URL/predict/text', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ text: 'Your email text' })
});
const data = await response.json();
console.log(data);
// PDF prediction
const formData = new FormData();
formData.append('file', pdfFile);
const response = await fetch('https://YOUR-SPACE-URL/predict/pdf', {
method: 'POST',
body: formData
});
const data = await response.json();
console.log(data);
Response Format
Text Prediction Response
{
"prediction": "SPAM",
"confidence": 95.67,
"spam_probability": 95.67,
"ham_probability": 4.33,
"model_used": "text"
}
PDF Prediction Response
{
"email_data": {
"subject": "Email subject",
"sender": "sender@example.com",
"body": "Email body content...",
"full_text": "Complete email text..."
},
"text_result": {
"prediction": "SPAM",
"confidence": 94.5,
"spam_probability": 94.5,
"ham_probability": 5.5
},
"image_result": {
"prediction": "SPAM",
"confidence": 92.3,
"spam_probability": 92.3,
"ham_probability": 7.7
},
"fusion_result": {
"prediction": "SPAM",
"confidence": 96.8,
"spam_probability": 96.8,
"ham_probability": 3.2
},
"final_prediction": "SPAM",
"final_confidence": 96.8
}
Model Architecture
Text Model (DeBERTa-v3-base)
- Pre-trained Microsoft DeBERTa-v3-base
- Custom projection layer to 512-dimensional fusion space
- Multi-layer classifier with LayerNorm and GELU activation
Image Model (ViT-base)
- Pre-trained Google ViT-base-patch16-224
- Custom projection layer to 512-dimensional fusion space
- Multi-layer classifier with LayerNorm and GELU activation
Fusion Model
- Combines text and image encoders
- Cross-modal attention mechanism for feature fusion
- Joint classification head for final prediction
Setup Instructions
Prepare your trained models: Place your
.pthmodel files in themodels/directory:models/text_model.pthmodels/image_model.pthmodels/fusion_model.pth
Deploy to Hugging Face Spaces:
- Create a new Space on Hugging Face
- Select Docker as the SDK
- Upload all files from this repository
- The API will automatically start on port 7860
Local Development
# Install dependencies
pip install -r requirements.txt
# Run the API
python app.py
The API will be available at http://localhost:7860
Requirements
- Python 3.10+
- PyTorch 2.1.0+
- Transformers 4.35.2+
- FastAPI 0.104.1+
- See
requirements.txtfor complete list
Model Files
⚠️ Important: This repository does not include the trained model weights. You need to:
- Train the models using the training script
- Save the model checkpoints (
.pthfiles) - Upload them to the
models/directory in your Hugging Face Space
Performance
The models are optimized for:
- Accuracy: High precision in spam detection
- Speed: Fast inference on CPU/GPU
- Multimodal: Leverages both text and image features
- Scalability: Handles concurrent requests efficiently
License
MIT License - See LICENSE file for details
Citation
If you use this API in your research, please cite:
@misc{fyp4_spam_detection,
title={FYP4 Spam Detection: Multimodal Email Spam Classification},
author={Your Name},
year={2024},
howpublished={\url{https://huggingface.co/spaces/YOUR-USERNAME/fyp4-spam-detection}}
}
Contact
For questions or issues, please open an issue on GitHub or contact the author.
Built with ❤️ using PyTorch, Transformers, and FastAPI