YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
GrandgemMa β Gemma 4 Scam Detection Eval & Fine-Tune Kit
Goal: Test
google/gemma-4-E2B-it(2B params) on real scam-call transcripts.
If accuracy < 90 % or F1(SCAM) < 85 % β fine-tune with Unsloth 4-bit LoRA, then convert to LiteRT for phone.
Model Size Reference
| Model | Params | FP32 RAM | 4-bit LiteRT RAM | Phone? |
|---|---|---|---|---|
gemma-4-31B-it |
31B | ~124 GB | ~16 GB | β No |
gemma-4-26B-A4B-it |
26B | ~104 GB | ~13 GB | β No |
gemma-4-E4B-it |
4B | ~16 GB | ~2 GB | β οΈ Flagship only |
gemma-4-E2B-it |
2B | ~8 GB | ~1.5 GB | β Mid-tier + budget |
We use gemma-4-E2B-it (2B) β smallest Gemma 4, fits on most phones after LiteRT quantization.
Datasets
- Primary:
BothBosu/scam-dialogueβ 800+ labeled transcripts (1=SCAM, 0=LEGIT). - Secondary:
BothBosu/Scammer-Conversationβ extra mixed conversations.
Quick Start
Step 1: Zero-shot eval (CPU, no GPU needed)
# Quick test β 20 samples, ~2-3 min on laptop CPU
python eval_zero_shot_cpu.py --limit 20
# Full test split β ~400 samples, ~30-45 min on CPU
python eval_zero_shot_cpu.py --limit -1
# If you have plenty of RAM, use fp16 to halve memory (~4 GB)
python eval_zero_shot_cpu.py --limit 20 --dtype fp16
Output: results_zero_shot_cpu.json + console report.
Step 2: Read the verdict
| Accuracy | F1(SCAM) | Verdict | Action |
|---|---|---|---|
| β₯ 90 % | β₯ 85 % | β PASS | Base model good. Go straight to LiteRT conversion. |
| 75β89 % | 70β84 % | β οΈ MARGINAL | Fine-tune, then LiteRT convert. |
| < 75 % | < 70 % | β FAIL | Fine-tune REQUIRED before phone deployment. |
Step 3: Fine-tune (if needed)
# Install
pip install unsloth transformers datasets trl peft accelerate
# Train on GPU (Kaggle T4Γ2 free, or Colab, or local GPU)
python train_sft_unsloth.py --push_to_hub s23deepak/grandgemma-scam-sft
# Then re-eval the fine-tuned model
python eval_zero_shot_cpu.py \
--model s23deepak/grandgemma-scam-sft \
--limit -1
Step 4: Convert to LiteRT for Android
After fine-tuning (or if base passes), convert the 2B model to .litertlm:
# Use litert-community tools
pip install litert
litert-convert \
--model s23deepak/grandgemma-scam-sft \
--output grandgemma-scam.litertlm \
--quantization int4
Target RAM on phone: ~1.5 GB for the 2B 4-bit model.
Files in This Repo
| File | Purpose |
|---|---|
eval_zero_shot_cpu.py |
CPU-only zero-shot eval (default, no GPU) |
eval_zero_shot.py |
GPU version (faster, same logic) |
train_sft_unsloth.py |
Unsloth 4-bit LoRA fine-tune |
format_dataset.py |
Convert dataset β ChatML JSONL |
Phone Deployment Checklist
- Zero-shot eval passes (β₯90% acc, β₯85% F1)
- OR fine-tuned model passes same threshold
- Convert to
.litertlm(int4 quantization) - Benchmark on target phone tier (mid-tier / budget)
- Measure cold-start load time (<2s target)
- Measure inference latency (<500ms per classification)
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support