Spaces:

haroon103
/

Spamforensics

Sleeping

File size: 5,455 Bytes

874058b

---
title: FYP4 Spam Detection API
emoji: 📧
colorFrom: blue
colorTo: red
sdk: docker
pinned: false
license: mit
---

# FYP4 Spam Detection API

A powerful email spam detection system using state-of-the-art transformer models (DeBERTa-v3 for text and ViT for images) with multimodal fusion capabilities.

## Features

- **Text-based Detection**: Uses Microsoft's DeBERTa-v3-base model for analyzing email text
- **Image-based Detection**: Uses Google's ViT model for analyzing embedded images
- **Multimodal Fusion**: Combines text and image features using cross-modal attention
- **PDF Email Support**: Extracts and analyzes content from PDF email files
- **RESTful API**: Easy-to-use FastAPI endpoints

## API Endpoints

### 1. Health Check
```bash
GET /health
```
Returns the status of the API and loaded models.

### 2. Text Prediction
```bash
POST /predict/text
Content-Type: application/json

{
  "text": "Your email text here"
}
```

### 3. PDF Prediction
```bash
POST /predict/pdf
Content-Type: multipart/form-data

file: <PDF file>
```

## Usage Examples

### Python
```python
import requests

# Text prediction
response = requests.post(
    "https://YOUR-SPACE-URL/predict/text",
    json={"text": "Congratulations! You've won $1,000,000!"}
)
print(response.json())

# PDF prediction
with open("email.pdf", "rb") as f:
    response = requests.post(
        "https://YOUR-SPACE-URL/predict/pdf",
        files={"file": f}
    )
print(response.json())
```

### cURL
```bash
# Text prediction
curl -X POST "https://YOUR-SPACE-URL/predict/text" \
  -H "Content-Type: application/json" \
  -d '{"text": "Your email text"}'

# PDF prediction
curl -X POST "https://YOUR-SPACE-URL/predict/pdf" \
  -F "file=@email.pdf"
```

### JavaScript
```javascript
// Text prediction
const response = await fetch('https://YOUR-SPACE-URL/predict/text', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({ text: 'Your email text' })
});
const data = await response.json();
console.log(data);

// PDF prediction
const formData = new FormData();
formData.append('file', pdfFile);
const response = await fetch('https://YOUR-SPACE-URL/predict/pdf', {
  method: 'POST',
  body: formData
});
const data = await response.json();
console.log(data);
```

## Response Format

### Text Prediction Response
```json
{
  "prediction": "SPAM",
  "confidence": 95.67,
  "spam_probability": 95.67,
  "ham_probability": 4.33,
  "model_used": "text"
}
```

### PDF Prediction Response
```json
{
  "email_data": {
    "subject": "Email subject",
    "sender": "sender@example.com",
    "body": "Email body content...",
    "full_text": "Complete email text..."
  },
  "text_result": {
    "prediction": "SPAM",
    "confidence": 94.5,
    "spam_probability": 94.5,
    "ham_probability": 5.5
  },
  "image_result": {
    "prediction": "SPAM",
    "confidence": 92.3,
    "spam_probability": 92.3,
    "ham_probability": 7.7
  },
  "fusion_result": {
    "prediction": "SPAM",
    "confidence": 96.8,
    "spam_probability": 96.8,
    "ham_probability": 3.2
  },
  "final_prediction": "SPAM",
  "final_confidence": 96.8
}
```

## Model Architecture

### Text Model (DeBERTa-v3-base)
- Pre-trained Microsoft DeBERTa-v3-base
- Custom projection layer to 512-dimensional fusion space
- Multi-layer classifier with LayerNorm and GELU activation

### Image Model (ViT-base)
- Pre-trained Google ViT-base-patch16-224
- Custom projection layer to 512-dimensional fusion space
- Multi-layer classifier with LayerNorm and GELU activation

### Fusion Model
- Combines text and image encoders
- Cross-modal attention mechanism for feature fusion
- Joint classification head for final prediction

## Setup Instructions

1. **Prepare your trained models**: Place your `.pth` model files in the `models/` directory:
   - `models/text_model.pth`
   - `models/image_model.pth`
   - `models/fusion_model.pth`

2. **Deploy to Hugging Face Spaces**:
   - Create a new Space on Hugging Face
   - Select Docker as the SDK
   - Upload all files from this repository
   - The API will automatically start on port 7860

## Local Development

```bash
# Install dependencies
pip install -r requirements.txt

# Run the API
python app.py
```

The API will be available at `http://localhost:7860`

## Requirements

- Python 3.10+
- PyTorch 2.1.0+
- Transformers 4.35.2+
- FastAPI 0.104.1+
- See `requirements.txt` for complete list

## Model Files

⚠️ **Important**: This repository does not include the trained model weights. You need to:

1. Train the models using the training script
2. Save the model checkpoints (`.pth` files)
3. Upload them to the `models/` directory in your Hugging Face Space

## Performance

The models are optimized for:
- **Accuracy**: High precision in spam detection
- **Speed**: Fast inference on CPU/GPU
- **Multimodal**: Leverages both text and image features
- **Scalability**: Handles concurrent requests efficiently

## License

MIT License - See LICENSE file for details

## Citation

If you use this API in your research, please cite:

```bibtex
@misc{fyp4_spam_detection,
  title={FYP4 Spam Detection: Multimodal Email Spam Classification},
  author={Your Name},
  year={2024},
  howpublished={\url{https://huggingface.co/spaces/YOUR-USERNAME/fyp4-spam-detection}}
}
```

## Contact

For questions or issues, please open an issue on GitHub or contact the author.

---

Built with ❤️ using PyTorch, Transformers, and FastAPI