Spaces:

haroon103
/

Spamforensics

Sleeping

App Files Files Community

Spamforensics / README.md

haroon103

Rename README.md.md to README.md

d7b03f5 verified about 1 month ago

preview code

raw

history blame contribute delete

5.46 kB

	---
	title: FYP4 Spam Detection API
	emoji: 📧
	colorFrom: blue
	colorTo: red
	sdk: docker
	pinned: false
	license: mit
	---

	# FYP4 Spam Detection API

	A powerful email spam detection system using state-of-the-art transformer models (DeBERTa-v3 for text and ViT for images) with multimodal fusion capabilities.

	## Features

	- Text-based Detection: Uses Microsoft's DeBERTa-v3-base model for analyzing email text
	- Image-based Detection: Uses Google's ViT model for analyzing embedded images
	- Multimodal Fusion: Combines text and image features using cross-modal attention
	- PDF Email Support: Extracts and analyzes content from PDF email files
	- RESTful API: Easy-to-use FastAPI endpoints

	## API Endpoints

	### 1. Health Check
	```bash
	GET /health
	```
	Returns the status of the API and loaded models.

	### 2. Text Prediction
	```bash
	POST /predict/text
	Content-Type: application/json

	{
	"text": "Your email text here"
	}
	```

	### 3. PDF Prediction
	```bash
	POST /predict/pdf
	Content-Type: multipart/form-data

	file: <PDF file>
	```

	## Usage Examples

	### Python
	```python
	import requests

	# Text prediction
	response = requests.post(
	"https://YOUR-SPACE-URL/predict/text",
	json={"text": "Congratulations! You've won $1,000,000!"}
	)
	print(response.json())

	# PDF prediction
	with open("email.pdf", "rb") as f:
	response = requests.post(
	"https://YOUR-SPACE-URL/predict/pdf",
	files={"file": f}
	)
	print(response.json())
	```

	### cURL
	```bash
	# Text prediction
	curl -X POST "https://YOUR-SPACE-URL/predict/text" \
	-H "Content-Type: application/json" \
	-d '{"text": "Your email text"}'

	# PDF prediction
	curl -X POST "https://YOUR-SPACE-URL/predict/pdf" \
	-F "file=@email.pdf"
	```

	### JavaScript
	```javascript
	// Text prediction
	const response = await fetch('https://YOUR-SPACE-URL/predict/text', {
	method: 'POST',
	headers: { 'Content-Type': 'application/json' },
	body: JSON.stringify({ text: 'Your email text' })
	});
	const data = await response.json();
	console.log(data);

	// PDF prediction
	const formData = new FormData();
	formData.append('file', pdfFile);
	const response = await fetch('https://YOUR-SPACE-URL/predict/pdf', {
	method: 'POST',
	body: formData
	});
	const data = await response.json();
	console.log(data);
	```

	## Response Format

	### Text Prediction Response
	```json
	{
	"prediction": "SPAM",
	"confidence": 95.67,
	"spam_probability": 95.67,
	"ham_probability": 4.33,
	"model_used": "text"
	}
	```

	### PDF Prediction Response
	```json
	{
	"email_data": {
	"subject": "Email subject",
	"sender": "sender@example.com",
	"body": "Email body content...",
	"full_text": "Complete email text..."
	},
	"text_result": {
	"prediction": "SPAM",
	"confidence": 94.5,
	"spam_probability": 94.5,
	"ham_probability": 5.5
	},
	"image_result": {
	"prediction": "SPAM",
	"confidence": 92.3,
	"spam_probability": 92.3,
	"ham_probability": 7.7
	},
	"fusion_result": {
	"prediction": "SPAM",
	"confidence": 96.8,
	"spam_probability": 96.8,
	"ham_probability": 3.2
	},
	"final_prediction": "SPAM",
	"final_confidence": 96.8
	}
	```

	## Model Architecture

	### Text Model (DeBERTa-v3-base)
	- Pre-trained Microsoft DeBERTa-v3-base
	- Custom projection layer to 512-dimensional fusion space
	- Multi-layer classifier with LayerNorm and GELU activation

	### Image Model (ViT-base)
	- Pre-trained Google ViT-base-patch16-224
	- Custom projection layer to 512-dimensional fusion space
	- Multi-layer classifier with LayerNorm and GELU activation

	### Fusion Model
	- Combines text and image encoders
	- Cross-modal attention mechanism for feature fusion
	- Joint classification head for final prediction

	## Setup Instructions

	1. Prepare your trained models: Place your `.pth` model files in the `models/` directory:
	- `models/text_model.pth`
	- `models/image_model.pth`
	- `models/fusion_model.pth`

	2. Deploy to Hugging Face Spaces:
	- Create a new Space on Hugging Face
	- Select Docker as the SDK
	- Upload all files from this repository
	- The API will automatically start on port 7860

	## Local Development

	```bash
	# Install dependencies
	pip install -r requirements.txt

	# Run the API
	python app.py
	```

	The API will be available at `http://localhost:7860`

	## Requirements

	- Python 3.10+
	- PyTorch 2.1.0+
	- Transformers 4.35.2+
	- FastAPI 0.104.1+
	- See `requirements.txt` for complete list

	## Model Files

	⚠️ Important: This repository does not include the trained model weights. You need to:

	1. Train the models using the training script
	2. Save the model checkpoints (`.pth` files)
	3. Upload them to the `models/` directory in your Hugging Face Space

	## Performance

	The models are optimized for:
	- Accuracy: High precision in spam detection
	- Speed: Fast inference on CPU/GPU
	- Multimodal: Leverages both text and image features
	- Scalability: Handles concurrent requests efficiently

	## License

	MIT License - See LICENSE file for details

	## Citation

	If you use this API in your research, please cite:

	```bibtex
	@misc{fyp4_spam_detection,
	title={FYP4 Spam Detection: Multimodal Email Spam Classification},
	author={Your Name},
	year={2024},
	howpublished={\url{https://huggingface.co/spaces/YOUR-USERNAME/fyp4-spam-detection}}
	}
	```

	## Contact

	For questions or issues, please open an issue on GitHub or contact the author.

	---

	Built with ❤️ using PyTorch, Transformers, and FastAPI