Text Generation
Transformers
Safetensors
English
gemma3_text
text-classification
email-classification
fine-tuned
unsloth
lora
qlora
gemma
causal-lm
conversational
text-generation-inference
Instructions to use OmarioVIC/customer-email-classifier with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use OmarioVIC/customer-email-classifier with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="OmarioVIC/customer-email-classifier") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("OmarioVIC/customer-email-classifier") model = AutoModelForCausalLM.from_pretrained("OmarioVIC/customer-email-classifier") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use OmarioVIC/customer-email-classifier with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "OmarioVIC/customer-email-classifier" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "OmarioVIC/customer-email-classifier", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/OmarioVIC/customer-email-classifier
- SGLang
How to use OmarioVIC/customer-email-classifier with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "OmarioVIC/customer-email-classifier" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "OmarioVIC/customer-email-classifier", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "OmarioVIC/customer-email-classifier" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "OmarioVIC/customer-email-classifier", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Unsloth Studio new
How to use OmarioVIC/customer-email-classifier with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for OmarioVIC/customer-email-classifier to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for OmarioVIC/customer-email-classifier to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for OmarioVIC/customer-email-classifier to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="OmarioVIC/customer-email-classifier", max_seq_length=2048, ) - Docker Model Runner
How to use OmarioVIC/customer-email-classifier with Docker Model Runner:
docker model run hf.co/OmarioVIC/customer-email-classifier
| language: | |
| - en | |
| license: gemma | |
| base_model: google/gemma-3-1b-it | |
| tags: | |
| - text-classification | |
| - email-classification | |
| - fine-tuned | |
| - unsloth | |
| - lora | |
| - qlora | |
| - gemma | |
| - causal-lm | |
| datasets: | |
| - response-classification-dataset | |
| pipeline_tag: text-generation | |
| library_name: transformers | |
| # 📧 Customer Email Response Classifier | |
| Fine-tuned **Gemma 3 1B IT** (`google/gemma-3-1b-it`) for classifying customer email responses into 5 categories. The model generates a structured JSON output and is optimized for low-latency deployment via **vLLM**. | |
| ## Model Summary | |
| | Property | Value | | |
| |---|---| | |
| | **Base model** | `google/gemma-3-1b-it` | | |
| | **Task** | Generative classification (Causal-LM) | | |
| | **PEFT method** | QLoRA (4-bit) via Unsloth | | |
| | **Training framework** | Unsloth `SFTTrainer` with completion-only masking | | |
| | **Dataset size** | ~3,500 samples | | |
| | **Output format** | `{"classification": "<label>"}` | | |
| | **Deployment target** | vLLM (`/v1/chat/completions`) | | |
| --- | |
| ## Labels | |
| The model classifies each email into exactly one of: | |
| | Label | Description | | |
| |---|---| | |
| | `automated_reply` | Auto-generated out-of-office or delivery receipts | | |
| | `interested` | Recipient shows genuine interest or engagement | | |
| | `not_interested` | Recipient explicitly declines or opts out | | |
| | `out_of_office` | Human OOO message (distinct from automated replies) | | |
| | `unrelated` | Reply does not relate to the original outreach | | |
| --- | |
| ## Usage | |
| ### Transformers (local) | |
| ```python | |
| import json | |
| import torch | |
| from transformers import pipeline | |
| LABELS = ["automated_reply", "interested", "not_interested", "out_of_office", "unrelated"] | |
| SYSTEM_PROMPT = ( | |
| "You are an email-response classifier. " | |
| f"Classify the email into exactly one of: {', '.join(LABELS)}. " | |
| 'Reply ONLY with a JSON object in the format: {"classification": "<label>"}. ' | |
| "Do not add any explanation." | |
| ) | |
| gen = pipeline( | |
| "text-generation", | |
| model="OmarioVIC/customer-email-classifier", | |
| device=0 if torch.cuda.is_available() else -1, | |
| do_sample=False, | |
| ) | |
| def classify(email_text: str) -> str: | |
| messages = [{"role": "user", "content": f"{SYSTEM_PROMPT}\n\nEmail text:\n{email_text}"}] | |
| prompt = gen.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) | |
| output = gen(prompt, max_new_tokens=20) | |
| generated = output[0]["generated_text"].split("<start_of_turn>model")[-1].strip() | |
| return json.loads(generated)["classification"] | |
| print(classify("Yeah, Monday works — book a 15-min call.")) | |
| # → "interested" | |
| ``` | |
| ### vLLM (recommended for production) | |
| **Serve:** | |
| ```bash | |
| pip install vllm | |
| vllm serve OmarioVIC/customer-email-classifier \ | |
| --dtype bfloat16 \ | |
| --max-model-len 512 | |
| ``` | |
| **Query:** | |
| ```bash | |
| curl http://localhost:8000/v1/chat/completions \ | |
| -H "Content-Type: application/json" \ | |
| -d '{ | |
| "model": "OmarioVIC/customer-email-classifier", | |
| "messages": [{ | |
| "role": "user", | |
| "content": "Classify into one of: automated_reply, interested, not_interested, out_of_office, unrelated. Reply with JSON only: {\"classification\": \"<label>\"}.\n\nEmail text:\nyeah 15 mins call? free monday" | |
| }], | |
| "max_tokens": 20, | |
| "temperature": 0 | |
| }' | |
| ``` | |
| --- | |
| ## Training Details | |
| ### Data Format | |
| Each training example is a chat-template conversation: | |
| ```json | |
| { | |
| "messages": [ | |
| { | |
| "role": "user", | |
| "content": "<system prompt>\n\nEmail text:\n<raw email body>" | |
| }, | |
| { | |
| "role": "assistant", | |
| "content": "{\"classification\": \"interested\"}" | |
| } | |
| ] | |
| } | |
| ``` | |
| Only the assistant turn is used for loss computation (completion-only masking via `train_on_responses_only`). | |
| ### Hyperparameters | |
| | Parameter | Value | | |
| |---|---| | |
| | Epochs | 3 | | |
| | Batch size (per device) | 4 | | |
| | Gradient accumulation steps | 4 | | |
| | Learning rate | 2e-4 | | |
| | LR scheduler | Cosine | | |
| | Warmup steps | 50 | | |
| | Max sequence length | 320 | | |
| | Precision | bfloat16 (Ampere+) / float16 | | |
| ### LoRA Config | |
| | Parameter | Value | | |
| |---|---| | |
| | Rank (`r`) | 32 | | |
| | Alpha | 32 | | |
| | Dropout | 0.05 | | |
| | Target modules | All linear layers | | |
| | Gradient checkpointing | Unsloth optimised | | |
| --- | |
| ## Framework | |
| Training was accelerated using [Unsloth](https://github.com/unslothai/unsloth), which provides: | |
| - **2× faster training** via custom CUDA kernels | |
| - **~60% less VRAM** via QLoRA 4-bit quantisation | |
| The final model was merged to full 16-bit weights (`merged_16bit`) for straightforward vLLM deployment. | |
| --- | |
| ## Limitations | |
| - Designed for **short email replies** (max 320 tokens including prompt). | |
| - Trained on a specific business outreach dataset; may not generalise to all email domains. | |
| - Output is deterministic (`do_sample=False`, `temperature=0`) — always greedy. | |
| --- | |
| ## License | |
| This model is derived from `google/gemma-3-1b-it` and is subject to the [Gemma Terms of Use](https://ai.google.dev/gemma/terms). |