| # GrandgemMa β Gemma 4 Scam Detection Eval & Fine-Tune Kit |
|
|
| > **Goal:** Test `google/gemma-4-E2B-it` (2B params) on real scam-call transcripts. |
| > If accuracy < 90 % or F1(SCAM) < 85 % β fine-tune with Unsloth 4-bit LoRA, then convert to LiteRT for phone. |
|
|
| ## Model Size Reference |
|
|
| | Model | Params | FP32 RAM | 4-bit LiteRT RAM | Phone? | |
| |---|---|---|---|---| |
| | `gemma-4-31B-it` | 31B | ~124 GB | ~16 GB | β No | |
| | `gemma-4-26B-A4B-it` | 26B | ~104 GB | ~13 GB | β No | |
| | `gemma-4-E4B-it` | 4B | ~16 GB | ~2 GB | β οΈ Flagship only | |
| | **`gemma-4-E2B-it`** | **2B** | **~8 GB** | **~1.5 GB** | β
**Mid-tier + budget** | |
|
|
| **We use `gemma-4-E2B-it` (2B)** β smallest Gemma 4, fits on most phones after LiteRT quantization. |
|
|
| ## Datasets |
|
|
| - **Primary:** [`BothBosu/scam-dialogue`](https://huggingface.co/datasets/BothBosu/scam-dialogue) β 800+ labeled transcripts (1=SCAM, 0=LEGIT). |
| - **Secondary:** [`BothBosu/Scammer-Conversation`](https://huggingface.co/datasets/BothBosu/Scammer-Conversation) β extra mixed conversations. |
|
|
| ## Quick Start |
|
|
| ### Step 1: Zero-shot eval (CPU, no GPU needed) |
|
|
| ```bash |
| # Quick test β 20 samples, ~2-3 min on laptop CPU |
| python eval_zero_shot_cpu.py --limit 20 |
| |
| # Full test split β ~400 samples, ~30-45 min on CPU |
| python eval_zero_shot_cpu.py --limit -1 |
| |
| # If you have plenty of RAM, use fp16 to halve memory (~4 GB) |
| python eval_zero_shot_cpu.py --limit 20 --dtype fp16 |
| ``` |
|
|
| **Output:** `results_zero_shot_cpu.json` + console report. |
|
|
| ### Step 2: Read the verdict |
|
|
| | Accuracy | F1(SCAM) | Verdict | Action | |
| |---|---|---|---| |
| | β₯ 90 % | β₯ 85 % | β
PASS | Base model good. Go straight to LiteRT conversion. | |
| | 75β89 % | 70β84 % | β οΈ MARGINAL | Fine-tune, then LiteRT convert. | |
| | < 75 % | < 70 % | β FAIL | Fine-tune REQUIRED before phone deployment. | |
|
|
| ### Step 3: Fine-tune (if needed) |
|
|
| ```bash |
| # Install |
| pip install unsloth transformers datasets trl peft accelerate |
| |
| # Train on GPU (Kaggle T4Γ2 free, or Colab, or local GPU) |
| python train_sft_unsloth.py --push_to_hub s23deepak/grandgemma-scam-sft |
| |
| # Then re-eval the fine-tuned model |
| python eval_zero_shot_cpu.py \ |
| --model s23deepak/grandgemma-scam-sft \ |
| --limit -1 |
| ``` |
|
|
| ### Step 4: Convert to LiteRT for Android |
|
|
| After fine-tuning (or if base passes), convert the 2B model to `.litertlm`: |
|
|
| ```bash |
| # Use litert-community tools |
| pip install litert |
| |
| litert-convert \ |
| --model s23deepak/grandgemma-scam-sft \ |
| --output grandgemma-scam.litertlm \ |
| --quantization int4 |
| ``` |
|
|
| Target RAM on phone: **~1.5 GB** for the 2B 4-bit model. |
|
|
| ## Files in This Repo |
|
|
| | File | Purpose | |
| |---|---| |
| | `eval_zero_shot_cpu.py` | **CPU-only** zero-shot eval (default, no GPU) | |
| | `eval_zero_shot.py` | GPU version (faster, same logic) | |
| | `train_sft_unsloth.py` | Unsloth 4-bit LoRA fine-tune | |
| | `format_dataset.py` | Convert dataset β ChatML JSONL | |
|
|
| ## Phone Deployment Checklist |
|
|
| - [ ] Zero-shot eval passes (β₯90% acc, β₯85% F1) |
| - [ ] OR fine-tuned model passes same threshold |
| - [ ] Convert to `.litertlm` (int4 quantization) |
| - [ ] Benchmark on target phone tier (mid-tier / budget) |
| - [ ] Measure cold-start load time (<2s target) |
| - [ ] Measure inference latency (<500ms per classification) |
|
|