OmarioVIC's picture
Update README.md
15d28ef verified
---
language:
- en
license: gemma
base_model: google/gemma-3-1b-it
tags:
- text-classification
- email-classification
- fine-tuned
- unsloth
- lora
- qlora
- gemma
- causal-lm
datasets:
- response-classification-dataset
pipeline_tag: text-generation
library_name: transformers
---
# 📧 Customer Email Response Classifier
Fine-tuned **Gemma 3 1B IT** (`google/gemma-3-1b-it`) for classifying customer email responses into 5 categories. The model generates a structured JSON output and is optimized for low-latency deployment via **vLLM**.
## Model Summary
| Property | Value |
|---|---|
| **Base model** | `google/gemma-3-1b-it` |
| **Task** | Generative classification (Causal-LM) |
| **PEFT method** | QLoRA (4-bit) via Unsloth |
| **Training framework** | Unsloth `SFTTrainer` with completion-only masking |
| **Dataset size** | ~3,500 samples |
| **Output format** | `{"classification": "<label>"}` |
| **Deployment target** | vLLM (`/v1/chat/completions`) |
---
## Labels
The model classifies each email into exactly one of:
| Label | Description |
|---|---|
| `automated_reply` | Auto-generated out-of-office or delivery receipts |
| `interested` | Recipient shows genuine interest or engagement |
| `not_interested` | Recipient explicitly declines or opts out |
| `out_of_office` | Human OOO message (distinct from automated replies) |
| `unrelated` | Reply does not relate to the original outreach |
---
## Usage
### Transformers (local)
```python
import json
import torch
from transformers import pipeline
LABELS = ["automated_reply", "interested", "not_interested", "out_of_office", "unrelated"]
SYSTEM_PROMPT = (
"You are an email-response classifier. "
f"Classify the email into exactly one of: {', '.join(LABELS)}. "
'Reply ONLY with a JSON object in the format: {"classification": "<label>"}. '
"Do not add any explanation."
)
gen = pipeline(
"text-generation",
model="OmarioVIC/customer-email-classifier",
device=0 if torch.cuda.is_available() else -1,
do_sample=False,
)
def classify(email_text: str) -> str:
messages = [{"role": "user", "content": f"{SYSTEM_PROMPT}\n\nEmail text:\n{email_text}"}]
prompt = gen.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
output = gen(prompt, max_new_tokens=20)
generated = output[0]["generated_text"].split("<start_of_turn>model")[-1].strip()
return json.loads(generated)["classification"]
print(classify("Yeah, Monday works — book a 15-min call."))
# → "interested"
```
### vLLM (recommended for production)
**Serve:**
```bash
pip install vllm
vllm serve OmarioVIC/customer-email-classifier \
--dtype bfloat16 \
--max-model-len 512
```
**Query:**
```bash
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "OmarioVIC/customer-email-classifier",
"messages": [{
"role": "user",
"content": "Classify into one of: automated_reply, interested, not_interested, out_of_office, unrelated. Reply with JSON only: {\"classification\": \"<label>\"}.\n\nEmail text:\nyeah 15 mins call? free monday"
}],
"max_tokens": 20,
"temperature": 0
}'
```
---
## Training Details
### Data Format
Each training example is a chat-template conversation:
```json
{
"messages": [
{
"role": "user",
"content": "<system prompt>\n\nEmail text:\n<raw email body>"
},
{
"role": "assistant",
"content": "{\"classification\": \"interested\"}"
}
]
}
```
Only the assistant turn is used for loss computation (completion-only masking via `train_on_responses_only`).
### Hyperparameters
| Parameter | Value |
|---|---|
| Epochs | 3 |
| Batch size (per device) | 4 |
| Gradient accumulation steps | 4 |
| Learning rate | 2e-4 |
| LR scheduler | Cosine |
| Warmup steps | 50 |
| Max sequence length | 320 |
| Precision | bfloat16 (Ampere+) / float16 |
### LoRA Config
| Parameter | Value |
|---|---|
| Rank (`r`) | 32 |
| Alpha | 32 |
| Dropout | 0.05 |
| Target modules | All linear layers |
| Gradient checkpointing | Unsloth optimised |
---
## Framework
Training was accelerated using [Unsloth](https://github.com/unslothai/unsloth), which provides:
- **2× faster training** via custom CUDA kernels
- **~60% less VRAM** via QLoRA 4-bit quantisation
The final model was merged to full 16-bit weights (`merged_16bit`) for straightforward vLLM deployment.
---
## Limitations
- Designed for **short email replies** (max 320 tokens including prompt).
- Trained on a specific business outreach dataset; may not generalise to all email domains.
- Output is deterministic (`do_sample=False`, `temperature=0`) — always greedy.
---
## License
This model is derived from `google/gemma-3-1b-it` and is subject to the [Gemma Terms of Use](https://ai.google.dev/gemma/terms).