--- title: FYP4 Spam Detection API emoji: 📧 colorFrom: blue colorTo: red sdk: docker pinned: false license: mit --- # FYP4 Spam Detection API A powerful email spam detection system using state-of-the-art transformer models (DeBERTa-v3 for text and ViT for images) with multimodal fusion capabilities. ## Features - **Text-based Detection**: Uses Microsoft's DeBERTa-v3-base model for analyzing email text - **Image-based Detection**: Uses Google's ViT model for analyzing embedded images - **Multimodal Fusion**: Combines text and image features using cross-modal attention - **PDF Email Support**: Extracts and analyzes content from PDF email files - **RESTful API**: Easy-to-use FastAPI endpoints ## API Endpoints ### 1. Health Check ```bash GET /health ``` Returns the status of the API and loaded models. ### 2. Text Prediction ```bash POST /predict/text Content-Type: application/json { "text": "Your email text here" } ``` ### 3. PDF Prediction ```bash POST /predict/pdf Content-Type: multipart/form-data file: ``` ## Usage Examples ### Python ```python import requests # Text prediction response = requests.post( "https://YOUR-SPACE-URL/predict/text", json={"text": "Congratulations! You've won $1,000,000!"} ) print(response.json()) # PDF prediction with open("email.pdf", "rb") as f: response = requests.post( "https://YOUR-SPACE-URL/predict/pdf", files={"file": f} ) print(response.json()) ``` ### cURL ```bash # Text prediction curl -X POST "https://YOUR-SPACE-URL/predict/text" \ -H "Content-Type: application/json" \ -d '{"text": "Your email text"}' # PDF prediction curl -X POST "https://YOUR-SPACE-URL/predict/pdf" \ -F "file=@email.pdf" ``` ### JavaScript ```javascript // Text prediction const response = await fetch('https://YOUR-SPACE-URL/predict/text', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ text: 'Your email text' }) }); const data = await response.json(); console.log(data); // PDF prediction const formData = new FormData(); formData.append('file', pdfFile); const response = await fetch('https://YOUR-SPACE-URL/predict/pdf', { method: 'POST', body: formData }); const data = await response.json(); console.log(data); ``` ## Response Format ### Text Prediction Response ```json { "prediction": "SPAM", "confidence": 95.67, "spam_probability": 95.67, "ham_probability": 4.33, "model_used": "text" } ``` ### PDF Prediction Response ```json { "email_data": { "subject": "Email subject", "sender": "sender@example.com", "body": "Email body content...", "full_text": "Complete email text..." }, "text_result": { "prediction": "SPAM", "confidence": 94.5, "spam_probability": 94.5, "ham_probability": 5.5 }, "image_result": { "prediction": "SPAM", "confidence": 92.3, "spam_probability": 92.3, "ham_probability": 7.7 }, "fusion_result": { "prediction": "SPAM", "confidence": 96.8, "spam_probability": 96.8, "ham_probability": 3.2 }, "final_prediction": "SPAM", "final_confidence": 96.8 } ``` ## Model Architecture ### Text Model (DeBERTa-v3-base) - Pre-trained Microsoft DeBERTa-v3-base - Custom projection layer to 512-dimensional fusion space - Multi-layer classifier with LayerNorm and GELU activation ### Image Model (ViT-base) - Pre-trained Google ViT-base-patch16-224 - Custom projection layer to 512-dimensional fusion space - Multi-layer classifier with LayerNorm and GELU activation ### Fusion Model - Combines text and image encoders - Cross-modal attention mechanism for feature fusion - Joint classification head for final prediction ## Setup Instructions 1. **Prepare your trained models**: Place your `.pth` model files in the `models/` directory: - `models/text_model.pth` - `models/image_model.pth` - `models/fusion_model.pth` 2. **Deploy to Hugging Face Spaces**: - Create a new Space on Hugging Face - Select Docker as the SDK - Upload all files from this repository - The API will automatically start on port 7860 ## Local Development ```bash # Install dependencies pip install -r requirements.txt # Run the API python app.py ``` The API will be available at `http://localhost:7860` ## Requirements - Python 3.10+ - PyTorch 2.1.0+ - Transformers 4.35.2+ - FastAPI 0.104.1+ - See `requirements.txt` for complete list ## Model Files ⚠️ **Important**: This repository does not include the trained model weights. You need to: 1. Train the models using the training script 2. Save the model checkpoints (`.pth` files) 3. Upload them to the `models/` directory in your Hugging Face Space ## Performance The models are optimized for: - **Accuracy**: High precision in spam detection - **Speed**: Fast inference on CPU/GPU - **Multimodal**: Leverages both text and image features - **Scalability**: Handles concurrent requests efficiently ## License MIT License - See LICENSE file for details ## Citation If you use this API in your research, please cite: ```bibtex @misc{fyp4_spam_detection, title={FYP4 Spam Detection: Multimodal Email Spam Classification}, author={Your Name}, year={2024}, howpublished={\url{https://huggingface.co/spaces/YOUR-USERNAME/fyp4-spam-detection}} } ``` ## Contact For questions or issues, please open an issue on GitHub or contact the author. --- Built with ❤️ using PyTorch, Transformers, and FastAPI