File size: 5,138 Bytes
1b20fe7 b7c31ab aea4b1f b7c31ab |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 |
---
title: Email Classification API
emoji: π§
colorFrom: blue
colorTo: green
sdk: docker
app_file: main.py
pinned: false
---
# Email Classification for Support Team
## Project Overview
This project implements an email classification system that categorizes support emails into predefined categories while ensuring that personal information (PII) is masked before processing. The system uses a combination of Named Entity Recognition (NER) techniques for PII masking and a pre-trained XLM-RoBERTa model for email classification.
## Key Features
1. **Email Classification**: Classifies support emails into four categories:
- Incident
- Request
- Change
- Problem
2. **Personal Information Masking**: Detects and masks the following types of PII:
- Full Name ("full_name")
- Email Address ("email")
- Phone number ("phone_number")
- Date of birth ("dob")
- Aadhar card number ("aadhar_num")
- Credit/Debit Card Number ("credit_debit_no")
- CVV number ("cvv_no")
- Card expiry number ("expiry_no")
3. **API Interface**: Exposes the solution as a RESTful API endpoint.
## Project Structure
```
.
βββ classification_model/ # Local model files (not used in deployment)
βββ docker-compose.yml # Docker Compose configuration
βββ Dockerfile # Docker configuration
βββ main.py # Main FastAPI application
βββ models.py # Email classifier model implementation
βββ README.md # Project documentation
βββ requirements.txt # Python dependencies
βββ utils.py # PII masker implementation
```
## Installation
### Prerequisites
- Python 3.8+
- [Docker](https://www.docker.com/) (optional)
- Hugging Face account for model hosting
### Setup
1. Clone the repository:
```
git clone <repository-url>
cd email_classifier_project
```
2. Install dependencies:
```
pip install -r requirements.txt
```
3. Run the application:
```
python main.py
```
### Using Docker
1. Build and run with Docker Compose:
```
docker-compose up
```
## Uploading the Model to Hugging Face Hub
Before deploying the application to Hugging Face Spaces, you need to upload the model to the Hugging Face Model Hub:
1. Install the Hugging Face CLI if you haven't already:
```
pip install huggingface_hub
```
2. Log in to Hugging Face:
```
huggingface-cli login
```
3. Create a new model repository on Hugging Face:
```
huggingface-cli repo create email-classifier-model
```
4. Upload the model using Python:
```python
from transformers import XLMRobertaForSequenceClassification, XLMRobertaTokenizer
# Load the local model
model = XLMRobertaForSequenceClassification.from_pretrained("classification_model")
tokenizer = XLMRobertaTokenizer.from_pretrained("classification_model")
# Push to Hugging Face Hub
model.push_to_hub("YourUsername/email-classifier-model")
tokenizer.push_to_hub("YourUsername/email-classifier-model")
```
5. Update the `MODEL_PATH` environment variable in the Dockerfile with your Hugging Face model path:
```
ENV MODEL_PATH="YourUsername/email-classifier-model"
```
## API Usage
The API exposes a single endpoint for email classification:
- **Endpoint**: `/classify`
- **Method**: POST
- **Input Format**:
```json
{
"input_email_body": "string containing the email"
}
```
- **Output Format**:
```json
{
"input_email_body": "string containing the email",
"list_of_masked_entities": [
{
"position": [start_index, end_index],
"classification": "entity_type",
"entity": "original_entity_value"
}
],
"masked_email": "string containing the masked email",
"category_of_the_email": "string containing the class"
}
```
## Example
```python
import requests
url = "https://sparkonix-email-classification-model.hf.space/classify"
data = {
"input_email_body": "Hello, my name is John Doe, and I'm having issues with my account."
}
response = requests.post(url, json=data)
print(response.json())
```
## Deployment to Hugging Face Spaces
1. Create a new Space on Hugging Face:
- Go to https://huggingface.co/spaces
- Click "Create new Space"
- Choose a name for your Space
- Select "Docker" as the Space SDK
2. Connect your GitHub repository to the Space:
- In the Space settings, go to "Repository"
- Enter your GitHub repository URL
- Authenticate with GitHub if prompted
3. Ensure your Hugging Face Space has access to the model:
- Go to your model on Hugging Face Hub
- Go to "Settings" > "Collaborators"
- Add your Space as a collaborator with "Read" access
4. Your API will be available at:
```
https://username-space-name.hf.space/classify
```
## Technologies Used
- **FastAPI**: Web framework for building the API
- **SpaCy**: NLP library for PII detection and masking
- **Transformers**: Hugging Face library for the email classification model
- **PyTorch**: Deep learning framework
- **Docker**: Containerization for deployment |