# GrandgemMa — Gemma 4 Scam Detection Eval & Fine-Tune Kit

> **Goal:** Test `google/gemma-4-E2B-it` (2B params) on real scam-call transcripts.  
> If accuracy < 90 % or F1(SCAM) < 85 % → fine-tune with Unsloth 4-bit LoRA, then convert to LiteRT for phone.

## Model Size Reference

| Model | Params | FP32 RAM | 4-bit LiteRT RAM | Phone? |
|---|---|---|---|---|
| `gemma-4-31B-it` | 31B | ~124 GB | ~16 GB | ❌ No |
| `gemma-4-26B-A4B-it` | 26B | ~104 GB | ~13 GB | ❌ No |
| `gemma-4-E4B-it` | 4B | ~16 GB | ~2 GB | ⚠️ Flagship only |
| **`gemma-4-E2B-it`** | **2B** | **~8 GB** | **~1.5 GB** | ✅ **Mid-tier + budget** |

**We use `gemma-4-E2B-it` (2B)** — smallest Gemma 4, fits on most phones after LiteRT quantization.

## Datasets

- **Primary:** [`BothBosu/scam-dialogue`](https://huggingface.co/datasets/BothBosu/scam-dialogue) — 800+ labeled transcripts (1=SCAM, 0=LEGIT).
- **Secondary:** [`BothBosu/Scammer-Conversation`](https://huggingface.co/datasets/BothBosu/Scammer-Conversation) — extra mixed conversations.

## Quick Start

### Step 1: Zero-shot eval (CPU, no GPU needed)

```bash
# Quick test — 20 samples, ~2-3 min on laptop CPU
python eval_zero_shot_cpu.py --limit 20

# Full test split — ~400 samples, ~30-45 min on CPU
python eval_zero_shot_cpu.py --limit -1

# If you have plenty of RAM, use fp16 to halve memory (~4 GB)
python eval_zero_shot_cpu.py --limit 20 --dtype fp16
```

**Output:** `results_zero_shot_cpu.json` + console report.

### Step 2: Read the verdict

| Accuracy | F1(SCAM) | Verdict | Action |
|---|---|---|---|
| ≥ 90 % | ≥ 85 % | ✅ PASS | Base model good. Go straight to LiteRT conversion. |
| 75–89 % | 70–84 % | ⚠️ MARGINAL | Fine-tune, then LiteRT convert. |
| < 75 % | < 70 % | ❌ FAIL | Fine-tune REQUIRED before phone deployment. |

### Step 3: Fine-tune (if needed)

```bash
# Install
pip install unsloth transformers datasets trl peft accelerate

# Train on GPU (Kaggle T4×2 free, or Colab, or local GPU)
python train_sft_unsloth.py --push_to_hub s23deepak/grandgemma-scam-sft

# Then re-eval the fine-tuned model
python eval_zero_shot_cpu.py \
  --model s23deepak/grandgemma-scam-sft \
  --limit -1
```

### Step 4: Convert to LiteRT for Android

After fine-tuning (or if base passes), convert the 2B model to `.litertlm`:

```bash
# Use litert-community tools
pip install litert

litert-convert \
  --model s23deepak/grandgemma-scam-sft \
  --output grandgemma-scam.litertlm \
  --quantization int4
```

Target RAM on phone: **~1.5 GB** for the 2B 4-bit model.

## Files in This Repo

| File | Purpose |
|---|---|
| `eval_zero_shot_cpu.py` | **CPU-only** zero-shot eval (default, no GPU) |
| `eval_zero_shot.py` | GPU version (faster, same logic) |
| `train_sft_unsloth.py` | Unsloth 4-bit LoRA fine-tune |
| `format_dataset.py` | Convert dataset → ChatML JSONL |

## Phone Deployment Checklist

- [ ] Zero-shot eval passes (≥90% acc, ≥85% F1)
- [ ] OR fine-tuned model passes same threshold
- [ ] Convert to `.litertlm` (int4 quantization)
- [ ] Benchmark on target phone tier (mid-tier / budget)
- [ ] Measure cold-start load time (<2s target)
- [ ] Measure inference latency (<500ms per classification)