|
|
--- |
|
|
title: Email Classification API |
|
|
emoji: π§ |
|
|
colorFrom: blue |
|
|
colorTo: green |
|
|
sdk: docker |
|
|
app_file: main.py |
|
|
pinned: false |
|
|
--- |
|
|
|
|
|
# Email Classification for Support Team |
|
|
|
|
|
## Project Overview |
|
|
|
|
|
This project implements an email classification system that categorizes support emails into predefined categories while ensuring that personal information (PII) is masked before processing. The system uses a combination of Named Entity Recognition (NER) techniques for PII masking and a pre-trained XLM-RoBERTa model for email classification. |
|
|
|
|
|
## Key Features |
|
|
|
|
|
1. **Email Classification**: Classifies support emails into four categories: |
|
|
- Incident |
|
|
- Request |
|
|
- Change |
|
|
- Problem |
|
|
|
|
|
2. **Personal Information Masking**: Detects and masks the following types of PII: |
|
|
- Full Name ("full_name") |
|
|
- Email Address ("email") |
|
|
- Phone number ("phone_number") |
|
|
- Date of birth ("dob") |
|
|
- Aadhar card number ("aadhar_num") |
|
|
- Credit/Debit Card Number ("credit_debit_no") |
|
|
- CVV number ("cvv_no") |
|
|
- Card expiry number ("expiry_no") |
|
|
|
|
|
3. **API Interface**: Exposes the solution as a RESTful API endpoint. |
|
|
|
|
|
## Project Structure |
|
|
|
|
|
``` |
|
|
. |
|
|
βββ classification_model/ # Local model files (not used in deployment) |
|
|
βββ docker-compose.yml # Docker Compose configuration |
|
|
βββ Dockerfile # Docker configuration |
|
|
βββ main.py # Main FastAPI application |
|
|
βββ models.py # Email classifier model implementation |
|
|
βββ README.md # Project documentation |
|
|
βββ requirements.txt # Python dependencies |
|
|
βββ utils.py # PII masker implementation |
|
|
``` |
|
|
|
|
|
## Installation |
|
|
|
|
|
### Prerequisites |
|
|
|
|
|
- Python 3.8+ |
|
|
- [Docker](https://www.docker.com/) (optional) |
|
|
- Hugging Face account for model hosting |
|
|
|
|
|
### Setup |
|
|
|
|
|
1. Clone the repository: |
|
|
``` |
|
|
git clone <repository-url> |
|
|
cd email_classifier_project |
|
|
``` |
|
|
|
|
|
2. Install dependencies: |
|
|
``` |
|
|
pip install -r requirements.txt |
|
|
``` |
|
|
|
|
|
3. Run the application: |
|
|
``` |
|
|
python main.py |
|
|
``` |
|
|
|
|
|
### Using Docker |
|
|
|
|
|
1. Build and run with Docker Compose: |
|
|
``` |
|
|
docker-compose up |
|
|
``` |
|
|
|
|
|
## Uploading the Model to Hugging Face Hub |
|
|
|
|
|
Before deploying the application to Hugging Face Spaces, you need to upload the model to the Hugging Face Model Hub: |
|
|
|
|
|
1. Install the Hugging Face CLI if you haven't already: |
|
|
``` |
|
|
pip install huggingface_hub |
|
|
``` |
|
|
|
|
|
2. Log in to Hugging Face: |
|
|
``` |
|
|
huggingface-cli login |
|
|
``` |
|
|
|
|
|
3. Create a new model repository on Hugging Face: |
|
|
``` |
|
|
huggingface-cli repo create email-classifier-model |
|
|
``` |
|
|
|
|
|
4. Upload the model using Python: |
|
|
```python |
|
|
from transformers import XLMRobertaForSequenceClassification, XLMRobertaTokenizer |
|
|
|
|
|
# Load the local model |
|
|
model = XLMRobertaForSequenceClassification.from_pretrained("classification_model") |
|
|
tokenizer = XLMRobertaTokenizer.from_pretrained("classification_model") |
|
|
|
|
|
# Push to Hugging Face Hub |
|
|
model.push_to_hub("YourUsername/email-classifier-model") |
|
|
tokenizer.push_to_hub("YourUsername/email-classifier-model") |
|
|
``` |
|
|
|
|
|
5. Update the `MODEL_PATH` environment variable in the Dockerfile with your Hugging Face model path: |
|
|
``` |
|
|
ENV MODEL_PATH="YourUsername/email-classifier-model" |
|
|
``` |
|
|
|
|
|
## API Usage |
|
|
|
|
|
The API exposes a single endpoint for email classification: |
|
|
|
|
|
- **Endpoint**: `/classify` |
|
|
- **Method**: POST |
|
|
- **Input Format**: |
|
|
```json |
|
|
{ |
|
|
"input_email_body": "string containing the email" |
|
|
} |
|
|
``` |
|
|
- **Output Format**: |
|
|
```json |
|
|
{ |
|
|
"input_email_body": "string containing the email", |
|
|
"list_of_masked_entities": [ |
|
|
{ |
|
|
"position": [start_index, end_index], |
|
|
"classification": "entity_type", |
|
|
"entity": "original_entity_value" |
|
|
} |
|
|
], |
|
|
"masked_email": "string containing the masked email", |
|
|
"category_of_the_email": "string containing the class" |
|
|
} |
|
|
``` |
|
|
|
|
|
## Example |
|
|
|
|
|
```python |
|
|
import requests |
|
|
|
|
|
url = "https://sparkonix-email-classification-model.hf.space/classify" |
|
|
data = { |
|
|
"input_email_body": "Hello, my name is John Doe, and I'm having issues with my account." |
|
|
} |
|
|
|
|
|
response = requests.post(url, json=data) |
|
|
print(response.json()) |
|
|
``` |
|
|
|
|
|
## Deployment to Hugging Face Spaces |
|
|
|
|
|
1. Create a new Space on Hugging Face: |
|
|
- Go to https://huggingface.co/spaces |
|
|
- Click "Create new Space" |
|
|
- Choose a name for your Space |
|
|
- Select "Docker" as the Space SDK |
|
|
|
|
|
2. Connect your GitHub repository to the Space: |
|
|
- In the Space settings, go to "Repository" |
|
|
- Enter your GitHub repository URL |
|
|
- Authenticate with GitHub if prompted |
|
|
|
|
|
3. Ensure your Hugging Face Space has access to the model: |
|
|
- Go to your model on Hugging Face Hub |
|
|
- Go to "Settings" > "Collaborators" |
|
|
- Add your Space as a collaborator with "Read" access |
|
|
|
|
|
4. Your API will be available at: |
|
|
``` |
|
|
https://username-space-name.hf.space/classify |
|
|
``` |
|
|
|
|
|
## Technologies Used |
|
|
|
|
|
- **FastAPI**: Web framework for building the API |
|
|
- **SpaCy**: NLP library for PII detection and masking |
|
|
- **Transformers**: Hugging Face library for the email classification model |
|
|
- **PyTorch**: Deep learning framework |
|
|
- **Docker**: Containerization for deployment |