File size: 5,138 Bytes
1b20fe7
 
 
 
 
 
 
 
 
 
b7c31ab
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
aea4b1f
b7c31ab
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
---
title: Email Classification API
emoji: πŸ“§
colorFrom: blue
colorTo: green
sdk: docker
app_file: main.py
pinned: false
---

# Email Classification for Support Team

## Project Overview

This project implements an email classification system that categorizes support emails into predefined categories while ensuring that personal information (PII) is masked before processing. The system uses a combination of Named Entity Recognition (NER) techniques for PII masking and a pre-trained XLM-RoBERTa model for email classification.

## Key Features

1. **Email Classification**: Classifies support emails into four categories:
   - Incident
   - Request
   - Change
   - Problem

2. **Personal Information Masking**: Detects and masks the following types of PII:
   - Full Name ("full_name")
   - Email Address ("email")
   - Phone number ("phone_number")
   - Date of birth ("dob")
   - Aadhar card number ("aadhar_num")
   - Credit/Debit Card Number ("credit_debit_no")
   - CVV number ("cvv_no")
   - Card expiry number ("expiry_no")

3. **API Interface**: Exposes the solution as a RESTful API endpoint.

## Project Structure

```
.
β”œβ”€β”€ classification_model/    # Local model files (not used in deployment)
β”œβ”€β”€ docker-compose.yml       # Docker Compose configuration
β”œβ”€β”€ Dockerfile               # Docker configuration
β”œβ”€β”€ main.py                  # Main FastAPI application
β”œβ”€β”€ models.py                # Email classifier model implementation
β”œβ”€β”€ README.md                # Project documentation
β”œβ”€β”€ requirements.txt         # Python dependencies
└── utils.py                 # PII masker implementation
```

## Installation

### Prerequisites

- Python 3.8+
- [Docker](https://www.docker.com/) (optional)
- Hugging Face account for model hosting

### Setup

1. Clone the repository:
   ```
   git clone <repository-url>
   cd email_classifier_project
   ```

2. Install dependencies:
   ```
   pip install -r requirements.txt
   ```

3. Run the application:
   ```
   python main.py
   ```

### Using Docker

1. Build and run with Docker Compose:
   ```
   docker-compose up
   ```

## Uploading the Model to Hugging Face Hub

Before deploying the application to Hugging Face Spaces, you need to upload the model to the Hugging Face Model Hub:

1. Install the Hugging Face CLI if you haven't already:
   ```
   pip install huggingface_hub
   ```

2. Log in to Hugging Face:
   ```
   huggingface-cli login
   ```

3. Create a new model repository on Hugging Face:
   ```
   huggingface-cli repo create email-classifier-model
   ```

4. Upload the model using Python:
   ```python
   from transformers import XLMRobertaForSequenceClassification, XLMRobertaTokenizer
   
   # Load the local model
   model = XLMRobertaForSequenceClassification.from_pretrained("classification_model")
   tokenizer = XLMRobertaTokenizer.from_pretrained("classification_model")
   
   # Push to Hugging Face Hub
   model.push_to_hub("YourUsername/email-classifier-model")
   tokenizer.push_to_hub("YourUsername/email-classifier-model")
   ```

5. Update the `MODEL_PATH` environment variable in the Dockerfile with your Hugging Face model path:
   ```
   ENV MODEL_PATH="YourUsername/email-classifier-model"
   ```

## API Usage

The API exposes a single endpoint for email classification:

- **Endpoint**: `/classify`
- **Method**: POST
- **Input Format**:
  ```json
  {
    "input_email_body": "string containing the email"
  }
  ```
- **Output Format**:
  ```json
  {
    "input_email_body": "string containing the email",
    "list_of_masked_entities": [
      {
        "position": [start_index, end_index],
        "classification": "entity_type",
        "entity": "original_entity_value"
      }
    ],
    "masked_email": "string containing the masked email",
    "category_of_the_email": "string containing the class"
  }
  ```

## Example

```python
import requests

url = "https://sparkonix-email-classification-model.hf.space/classify"
data = {
    "input_email_body": "Hello, my name is John Doe, and I'm having issues with my account."
}

response = requests.post(url, json=data)
print(response.json())
```

## Deployment to Hugging Face Spaces

1. Create a new Space on Hugging Face:
   - Go to https://huggingface.co/spaces
   - Click "Create new Space"
   - Choose a name for your Space
   - Select "Docker" as the Space SDK

2. Connect your GitHub repository to the Space:
   - In the Space settings, go to "Repository"
   - Enter your GitHub repository URL
   - Authenticate with GitHub if prompted

3. Ensure your Hugging Face Space has access to the model:
   - Go to your model on Hugging Face Hub
   - Go to "Settings" > "Collaborators"
   - Add your Space as a collaborator with "Read" access

4. Your API will be available at:
   ```
   https://username-space-name.hf.space/classify
   ```

## Technologies Used

- **FastAPI**: Web framework for building the API
- **SpaCy**: NLP library for PII detection and masking
- **Transformers**: Hugging Face library for the email classification model
- **PyTorch**: Deep learning framework
- **Docker**: Containerization for deployment