# GrandgemMa — Gemma 4 Scam Detection Eval & Fine-Tune Kit > **Goal:** Test `google/gemma-4-E2B-it` (2B params) on real scam-call transcripts. > If accuracy < 90 % or F1(SCAM) < 85 % → fine-tune with Unsloth 4-bit LoRA, then convert to LiteRT for phone. ## Model Size Reference | Model | Params | FP32 RAM | 4-bit LiteRT RAM | Phone? | |---|---|---|---|---| | `gemma-4-31B-it` | 31B | ~124 GB | ~16 GB | ❌ No | | `gemma-4-26B-A4B-it` | 26B | ~104 GB | ~13 GB | ❌ No | | `gemma-4-E4B-it` | 4B | ~16 GB | ~2 GB | ⚠️ Flagship only | | **`gemma-4-E2B-it`** | **2B** | **~8 GB** | **~1.5 GB** | ✅ **Mid-tier + budget** | **We use `gemma-4-E2B-it` (2B)** — smallest Gemma 4, fits on most phones after LiteRT quantization. ## Datasets - **Primary:** [`BothBosu/scam-dialogue`](https://huggingface.co/datasets/BothBosu/scam-dialogue) — 800+ labeled transcripts (1=SCAM, 0=LEGIT). - **Secondary:** [`BothBosu/Scammer-Conversation`](https://huggingface.co/datasets/BothBosu/Scammer-Conversation) — extra mixed conversations. ## Quick Start ### Step 1: Zero-shot eval (CPU, no GPU needed) ```bash # Quick test — 20 samples, ~2-3 min on laptop CPU python eval_zero_shot_cpu.py --limit 20 # Full test split — ~400 samples, ~30-45 min on CPU python eval_zero_shot_cpu.py --limit -1 # If you have plenty of RAM, use fp16 to halve memory (~4 GB) python eval_zero_shot_cpu.py --limit 20 --dtype fp16 ``` **Output:** `results_zero_shot_cpu.json` + console report. ### Step 2: Read the verdict | Accuracy | F1(SCAM) | Verdict | Action | |---|---|---|---| | ≥ 90 % | ≥ 85 % | ✅ PASS | Base model good. Go straight to LiteRT conversion. | | 75–89 % | 70–84 % | ⚠️ MARGINAL | Fine-tune, then LiteRT convert. | | < 75 % | < 70 % | ❌ FAIL | Fine-tune REQUIRED before phone deployment. | ### Step 3: Fine-tune (if needed) ```bash # Install pip install unsloth transformers datasets trl peft accelerate # Train on GPU (Kaggle T4×2 free, or Colab, or local GPU) python train_sft_unsloth.py --push_to_hub s23deepak/grandgemma-scam-sft # Then re-eval the fine-tuned model python eval_zero_shot_cpu.py \ --model s23deepak/grandgemma-scam-sft \ --limit -1 ``` ### Step 4: Convert to LiteRT for Android After fine-tuning (or if base passes), convert the 2B model to `.litertlm`: ```bash # Use litert-community tools pip install litert litert-convert \ --model s23deepak/grandgemma-scam-sft \ --output grandgemma-scam.litertlm \ --quantization int4 ``` Target RAM on phone: **~1.5 GB** for the 2B 4-bit model. ## Files in This Repo | File | Purpose | |---|---| | `eval_zero_shot_cpu.py` | **CPU-only** zero-shot eval (default, no GPU) | | `eval_zero_shot.py` | GPU version (faster, same logic) | | `train_sft_unsloth.py` | Unsloth 4-bit LoRA fine-tune | | `format_dataset.py` | Convert dataset → ChatML JSONL | ## Phone Deployment Checklist - [ ] Zero-shot eval passes (≥90% acc, ≥85% F1) - [ ] OR fine-tuned model passes same threshold - [ ] Convert to `.litertlm` (int4 quantization) - [ ] Benchmark on target phone tier (mid-tier / budget) - [ ] Measure cold-start load time (<2s target) - [ ] Measure inference latency (<500ms per classification)