| | ---
|
| | license: apache-2.0
|
| | ---
|
| | # Email Classifier
|
| |
|
| | This project implements an email classification model that assigns each email to a specific category using SBERT `all-minilm-l6-v2` for text embeddings, followed by a sequential neural network for final classification.
|
| | ## Model Description
|
| | - **Architecture:** `SBERT (384‑d) → Dense(256, ReLU) → Dropout(0.4) → Dense(128, ReLU) → Dropout(0.4) → Softmax(5)`
|
| | - **Frameworks:** TensorFlow2.17, sentence‑transformer
|
| |
|
| | ## Training Data & Preprocessing
|
| | - **Emails:** 4954 college emails, manually labeled into `[Academics, Clubs, Internships, Others, Talks]`
|
| | - **Split:** 80% train / 20% test
|
| | - **Embedding & Labeling:**
|
| | 1. Each email was embedded with `all‑MiniLM‑L6‑v2` (SBERT).
|
| | 2. We created a small “prototype” set of example sentences for each category.
|
| | 3. For every email, we computed cosine similarities between its SBERT embedding and each prototype embedding.
|
| | 4. The email was assigned to the category whose prototype had the **highest** cosine score (threshold ≥ 0.4).
|
| |
|
| | ## Evaluation
|
| |
|
| | The model was tested on **991** college‑email samples. Below are the per‑class precision, recall, F1‑score and support:
|
| |
|
| | | Class | label | Support | Precision | Recall | F1‑Score |
|
| | |:-----:|-------------|--------:|----------:|-------:|---------:|
|
| | | 0 | Academics | 200 | 0.92 | 0.97 | 0.94 |
|
| | | 1 | Clubs | 236 | 0.94 | 0.96 | 0.95 |
|
| | | 2 | Internships | 143 | 0.95 | 0.98 | 0.97 |
|
| | | 3 | Others | 200 | 0.95 | 0.83 | 0.89 |
|
| | | 4 | Takls | 212 | 0.93 | 0.94 | 0.93 |
|
| |
|
| | \
|
| | **Aggregate metrics**
|
| |
|
| | | Metric | Accuracy | Precision | Recall | F1‑Score |
|
| | |:-------------|---------:|----------:|-------:|---------:|
|
| | | Overall | 0.94 | — | — | — |
|
| | | Macro avg | — | 0.94 | 0.94 | 0.94 |
|
| | | Weighted avg | — | 0.94 | 0.94 | 0.93 |
|
| |
|
| | ### Confusion Matrix
|
| |
|
| | 
|
| |
|
| | ## Usage
|
| |
|
| | ### 1. Install dependencies
|
| | ```bash
|
| | pip install tensorflow sentence-transformers huggingface_hub
|
| | ```
|
| | ### 2. Load the model & embedder
|
| | ``` python
|
| | from sentence_transformers import SentenceTransformer
|
| | import tensorflow as tf
|
| | from huggingface_hub import hf_hub_download
|
| |
|
| | # 1) Load SBERT embedder
|
| | embedder = SentenceTransformer("all-MiniLM-L6-v2")
|
| |
|
| | # 2) Load your fine‑tuned classifier
|
| | model_file = hf_hub_download(
|
| | repo_id="skgezhil2005/email_classifier",
|
| | filename="model_v2.keras" #replace with your model file
|
| | )
|
| | model = tf.keras.models.load_model(model_file)
|
| |
|
| | # 3) Define label names (in the same order used during training)
|
| | labels = ["Academics", "Clubs", "Internships", "Others", "Talks"]
|
| | ```
|
| | ### 3. Inference Helper
|
| |
|
| | ``` python
|
| | def classify_email(text: str) -> str:
|
| | # Compute a 1×384 SBERT embedding
|
| | emb = embedder.encode(text, convert_to_tensor=False)
|
| | emb = emb.reshape(1, -1)
|
| | # Predict probabilities and pick the highest‐scoring class
|
| | prediction = model.predict(emb)
|
| | pred_idx = int(np.argmax(prediction[0]))
|
| |
|
| | return labels[pred_idx]
|
| |
|
| | ``` |