OmarioVIC commited on
Commit
7cc36f0
·
verified ·
1 Parent(s): 0d156d0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +176 -13
README.md CHANGED
@@ -1,21 +1,184 @@
1
  ---
2
- base_model: unsloth/gemma-3-1b-it-unsloth-bnb-4bit
3
- tags:
4
- - text-generation-inference
5
- - transformers
6
- - unsloth
7
- - gemma3_text
8
- license: apache-2.0
9
  language:
10
  - en
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  ---
12
 
13
- # Uploaded finetuned model
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
 
15
- - **Developed by:** OmarioVIC
16
- - **License:** apache-2.0
17
- - **Finetuned from model :** unsloth/gemma-3-1b-it-unsloth-bnb-4bit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
 
19
- This gemma3_text model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
20
 
21
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
1
  ---
 
 
 
 
 
 
 
2
  language:
3
  - en
4
+ license: gemma
5
+ base_model: google/gemma-3-1b-it
6
+ tags:
7
+ - text-classification
8
+ - email-classification
9
+ - fine-tuned
10
+ - unsloth
11
+ - lora
12
+ - qlora
13
+ - gemma
14
+ - causal-lm
15
+ datasets:
16
+ - response-classification-dataset
17
+ pipeline_tag: text-generation
18
+ library_name: vllm
19
+ ---
20
+
21
+ # 📧 Customer Email Response Classifier
22
+
23
+ Fine-tuned **Gemma 3 1B IT** (`google/gemma-3-1b-it`) for classifying customer email responses into 5 categories. The model generates a structured JSON output and is optimized for low-latency deployment via **vLLM**.
24
+
25
+ ## Model Summary
26
+
27
+ | Property | Value |
28
+ |---|---|
29
+ | **Base model** | `google/gemma-3-1b-it` |
30
+ | **Task** | Generative classification (Causal-LM) |
31
+ | **PEFT method** | QLoRA (4-bit) via Unsloth |
32
+ | **Training framework** | Unsloth `SFTTrainer` with completion-only masking |
33
+ | **Dataset size** | ~3,500 samples |
34
+ | **Output format** | `{"classification": "<label>"}` |
35
+ | **Deployment target** | vLLM (`/v1/chat/completions`) |
36
+
37
+ ---
38
+
39
+ ## Labels
40
+
41
+ The model classifies each email into exactly one of:
42
+
43
+ | Label | Description |
44
+ |---|---|
45
+ | `automated_reply` | Auto-generated out-of-office or delivery receipts |
46
+ | `interested` | Recipient shows genuine interest or engagement |
47
+ | `not_interested` | Recipient explicitly declines or opts out |
48
+ | `out_of_office` | Human OOO message (distinct from automated replies) |
49
+ | `unrelated` | Reply does not relate to the original outreach |
50
+
51
+ ---
52
+
53
+ ## Usage
54
+
55
+ ### Transformers (local)
56
+
57
+ ```python
58
+ import json
59
+ import torch
60
+ from transformers import pipeline
61
+
62
+ LABELS = ["automated_reply", "interested", "not_interested", "out_of_office", "unrelated"]
63
+ SYSTEM_PROMPT = (
64
+ "You are an email-response classifier. "
65
+ f"Classify the email into exactly one of: {', '.join(LABELS)}. "
66
+ 'Reply ONLY with a JSON object in the format: {"classification": "<label>"}. '
67
+ "Do not add any explanation."
68
+ )
69
+
70
+ gen = pipeline(
71
+ "text-generation",
72
+ model="OmarioVIC/customer-email-classifier",
73
+ device=0 if torch.cuda.is_available() else -1,
74
+ do_sample=False,
75
+ )
76
+
77
+ def classify(email_text: str) -> str:
78
+ messages = [{"role": "user", "content": f"{SYSTEM_PROMPT}\n\nEmail text:\n{email_text}"}]
79
+ prompt = gen.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
80
+ output = gen(prompt, max_new_tokens=20)
81
+ generated = output[0]["generated_text"].split("<start_of_turn>model")[-1].strip()
82
+ return json.loads(generated)["classification"]
83
+
84
+ print(classify("Yeah, Monday works — book a 15-min call."))
85
+ # → "interested"
86
+ ```
87
+
88
+ ### vLLM (recommended for production)
89
+
90
+ **Serve:**
91
+ ```bash
92
+ pip install vllm
93
+
94
+ vllm serve OmarioVIC/customer-email-classifier \
95
+ --dtype bfloat16 \
96
+ --max-model-len 512
97
+ ```
98
+
99
+ **Query:**
100
+ ```bash
101
+ curl http://localhost:8000/v1/chat/completions \
102
+ -H "Content-Type: application/json" \
103
+ -d '{
104
+ "model": "OmarioVIC/customer-email-classifier",
105
+ "messages": [{
106
+ "role": "user",
107
+ "content": "Classify into one of: automated_reply, interested, not_interested, out_of_office, unrelated. Reply with JSON only: {\"classification\": \"<label>\"}.\n\nEmail text:\nyeah 15 mins call? free monday"
108
+ }],
109
+ "max_tokens": 20,
110
+ "temperature": 0
111
+ }'
112
+ ```
113
+
114
  ---
115
 
116
+ ## Training Details
117
+
118
+ ### Data Format
119
+
120
+ Each training example is a chat-template conversation:
121
+
122
+ ```json
123
+ {
124
+ "messages": [
125
+ {
126
+ "role": "user",
127
+ "content": "<system prompt>\n\nEmail text:\n<raw email body>"
128
+ },
129
+ {
130
+ "role": "assistant",
131
+ "content": "{\"classification\": \"interested\"}"
132
+ }
133
+ ]
134
+ }
135
+ ```
136
+
137
+ Only the assistant turn is used for loss computation (completion-only masking via `train_on_responses_only`).
138
+
139
+ ### Hyperparameters
140
+
141
+ | Parameter | Value |
142
+ |---|---|
143
+ | Epochs | 3 |
144
+ | Batch size (per device) | 4 |
145
+ | Gradient accumulation steps | 4 |
146
+ | Learning rate | 2e-4 |
147
+ | LR scheduler | Cosine |
148
+ | Warmup steps | 50 |
149
+ | Max sequence length | 320 |
150
+ | Precision | bfloat16 (Ampere+) / float16 |
151
+
152
+ ### LoRA Config
153
 
154
+ | Parameter | Value |
155
+ |---|---|
156
+ | Rank (`r`) | 32 |
157
+ | Alpha | 32 |
158
+ | Dropout | 0.05 |
159
+ | Target modules | All linear layers |
160
+ | Gradient checkpointing | Unsloth optimised |
161
+
162
+ ---
163
+
164
+ ## Framework
165
+
166
+ Training was accelerated using [Unsloth](https://github.com/unslothai/unsloth), which provides:
167
+ - **2× faster training** via custom CUDA kernels
168
+ - **~60% less VRAM** via QLoRA 4-bit quantisation
169
+
170
+ The final model was merged to full 16-bit weights (`merged_16bit`) for straightforward vLLM deployment.
171
+
172
+ ---
173
+
174
+ ## Limitations
175
+
176
+ - Designed for **short email replies** (max 320 tokens including prompt).
177
+ - Trained on a specific business outreach dataset; may not generalise to all email domains.
178
+ - Output is deterministic (`do_sample=False`, `temperature=0`) — always greedy.
179
+
180
+ ---
181
 
182
+ ## License
183
 
184
+ This model is derived from `google/gemma-3-1b-it` and is subject to the [Gemma Terms of Use](https://ai.google.dev/gemma/terms).