Spamforensics / README.md
haroon103's picture
Rename README.md.md to README.md
d7b03f5 verified
---
title: FYP4 Spam Detection API
emoji: 📧
colorFrom: blue
colorTo: red
sdk: docker
pinned: false
license: mit
---
# FYP4 Spam Detection API
A powerful email spam detection system using state-of-the-art transformer models (DeBERTa-v3 for text and ViT for images) with multimodal fusion capabilities.
## Features
- **Text-based Detection**: Uses Microsoft's DeBERTa-v3-base model for analyzing email text
- **Image-based Detection**: Uses Google's ViT model for analyzing embedded images
- **Multimodal Fusion**: Combines text and image features using cross-modal attention
- **PDF Email Support**: Extracts and analyzes content from PDF email files
- **RESTful API**: Easy-to-use FastAPI endpoints
## API Endpoints
### 1. Health Check
```bash
GET /health
```
Returns the status of the API and loaded models.
### 2. Text Prediction
```bash
POST /predict/text
Content-Type: application/json
{
"text": "Your email text here"
}
```
### 3. PDF Prediction
```bash
POST /predict/pdf
Content-Type: multipart/form-data
file: <PDF file>
```
## Usage Examples
### Python
```python
import requests
# Text prediction
response = requests.post(
"https://YOUR-SPACE-URL/predict/text",
json={"text": "Congratulations! You've won $1,000,000!"}
)
print(response.json())
# PDF prediction
with open("email.pdf", "rb") as f:
response = requests.post(
"https://YOUR-SPACE-URL/predict/pdf",
files={"file": f}
)
print(response.json())
```
### cURL
```bash
# Text prediction
curl -X POST "https://YOUR-SPACE-URL/predict/text" \
-H "Content-Type: application/json" \
-d '{"text": "Your email text"}'
# PDF prediction
curl -X POST "https://YOUR-SPACE-URL/predict/pdf" \
-F "file=@email.pdf"
```
### JavaScript
```javascript
// Text prediction
const response = await fetch('https://YOUR-SPACE-URL/predict/text', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ text: 'Your email text' })
});
const data = await response.json();
console.log(data);
// PDF prediction
const formData = new FormData();
formData.append('file', pdfFile);
const response = await fetch('https://YOUR-SPACE-URL/predict/pdf', {
method: 'POST',
body: formData
});
const data = await response.json();
console.log(data);
```
## Response Format
### Text Prediction Response
```json
{
"prediction": "SPAM",
"confidence": 95.67,
"spam_probability": 95.67,
"ham_probability": 4.33,
"model_used": "text"
}
```
### PDF Prediction Response
```json
{
"email_data": {
"subject": "Email subject",
"sender": "sender@example.com",
"body": "Email body content...",
"full_text": "Complete email text..."
},
"text_result": {
"prediction": "SPAM",
"confidence": 94.5,
"spam_probability": 94.5,
"ham_probability": 5.5
},
"image_result": {
"prediction": "SPAM",
"confidence": 92.3,
"spam_probability": 92.3,
"ham_probability": 7.7
},
"fusion_result": {
"prediction": "SPAM",
"confidence": 96.8,
"spam_probability": 96.8,
"ham_probability": 3.2
},
"final_prediction": "SPAM",
"final_confidence": 96.8
}
```
## Model Architecture
### Text Model (DeBERTa-v3-base)
- Pre-trained Microsoft DeBERTa-v3-base
- Custom projection layer to 512-dimensional fusion space
- Multi-layer classifier with LayerNorm and GELU activation
### Image Model (ViT-base)
- Pre-trained Google ViT-base-patch16-224
- Custom projection layer to 512-dimensional fusion space
- Multi-layer classifier with LayerNorm and GELU activation
### Fusion Model
- Combines text and image encoders
- Cross-modal attention mechanism for feature fusion
- Joint classification head for final prediction
## Setup Instructions
1. **Prepare your trained models**: Place your `.pth` model files in the `models/` directory:
- `models/text_model.pth`
- `models/image_model.pth`
- `models/fusion_model.pth`
2. **Deploy to Hugging Face Spaces**:
- Create a new Space on Hugging Face
- Select Docker as the SDK
- Upload all files from this repository
- The API will automatically start on port 7860
## Local Development
```bash
# Install dependencies
pip install -r requirements.txt
# Run the API
python app.py
```
The API will be available at `http://localhost:7860`
## Requirements
- Python 3.10+
- PyTorch 2.1.0+
- Transformers 4.35.2+
- FastAPI 0.104.1+
- See `requirements.txt` for complete list
## Model Files
⚠️ **Important**: This repository does not include the trained model weights. You need to:
1. Train the models using the training script
2. Save the model checkpoints (`.pth` files)
3. Upload them to the `models/` directory in your Hugging Face Space
## Performance
The models are optimized for:
- **Accuracy**: High precision in spam detection
- **Speed**: Fast inference on CPU/GPU
- **Multimodal**: Leverages both text and image features
- **Scalability**: Handles concurrent requests efficiently
## License
MIT License - See LICENSE file for details
## Citation
If you use this API in your research, please cite:
```bibtex
@misc{fyp4_spam_detection,
title={FYP4 Spam Detection: Multimodal Email Spam Classification},
author={Your Name},
year={2024},
howpublished={\url{https://huggingface.co/spaces/YOUR-USERNAME/fyp4-spam-detection}}
}
```
## Contact
For questions or issues, please open an issue on GitHub or contact the author.
---
Built with ❤️ using PyTorch, Transformers, and FastAPI