GrandgemMa — Gemma 4 Scam Detection Eval & Fine-Tune Kit

Goal: Test google/gemma-4-E2B-it (2B params) on real scam-call transcripts.
If accuracy < 90 % or F1(SCAM) < 85 % → fine-tune with Unsloth 4-bit LoRA, then convert to LiteRT for phone.

Model Size Reference

Model	Params	FP32 RAM	4-bit LiteRT RAM	Phone?
`gemma-4-31B-it`	31B	~124 GB	~16 GB	❌ No
`gemma-4-26B-A4B-it`	26B	~104 GB	~13 GB	❌ No
`gemma-4-E4B-it`	4B	~16 GB	~2 GB	⚠️ Flagship only
`gemma-4-E2B-it`	2B	~8 GB	~1.5 GB	✅ Mid-tier + budget

We use gemma-4-E2B-it (2B) — smallest Gemma 4, fits on most phones after LiteRT quantization.

Datasets

Primary: BothBosu/scam-dialogue — 800+ labeled transcripts (1=SCAM, 0=LEGIT).
Secondary: BothBosu/Scammer-Conversation — extra mixed conversations.

Quick Start

Step 1: Zero-shot eval (CPU, no GPU needed)

# Quick test — 20 samples, ~2-3 min on laptop CPU
python eval_zero_shot_cpu.py --limit 20

# Full test split — ~400 samples, ~30-45 min on CPU
python eval_zero_shot_cpu.py --limit -1

# If you have plenty of RAM, use fp16 to halve memory (~4 GB)
python eval_zero_shot_cpu.py --limit 20 --dtype fp16

Output: results_zero_shot_cpu.json + console report.

Step 2: Read the verdict

Accuracy	F1(SCAM)	Verdict	Action
≥ 90 %	≥ 85 %	✅ PASS	Base model good. Go straight to LiteRT conversion.
75–89 %	70–84 %	⚠️ MARGINAL	Fine-tune, then LiteRT convert.
< 75 %	< 70 %	❌ FAIL	Fine-tune REQUIRED before phone deployment.

Step 3: Fine-tune (if needed)

# Install
pip install unsloth transformers datasets trl peft accelerate

# Train on GPU (Kaggle T4×2 free, or Colab, or local GPU)
python train_sft_unsloth.py --push_to_hub s23deepak/grandgemma-scam-sft

# Then re-eval the fine-tuned model
python eval_zero_shot_cpu.py \
  --model s23deepak/grandgemma-scam-sft \
  --limit -1

Step 4: Convert to LiteRT for Android

After fine-tuning (or if base passes), convert the 2B model to .litertlm:

# Use litert-community tools
pip install litert

litert-convert \
  --model s23deepak/grandgemma-scam-sft \
  --output grandgemma-scam.litertlm \
  --quantization int4

Target RAM on phone: ~1.5 GB for the 2B 4-bit model.

Files in This Repo

File	Purpose
`eval_zero_shot_cpu.py`	CPU-only zero-shot eval (default, no GPU)
`eval_zero_shot.py`	GPU version (faster, same logic)
`train_sft_unsloth.py`	Unsloth 4-bit LoRA fine-tune
`format_dataset.py`	Convert dataset → ChatML JSONL

Phone Deployment Checklist

Zero-shot eval passes (≥90% acc, ≥85% F1)
OR fine-tuned model passes same threshold
Convert to .litertlm (int4 quantization)
Benchmark on target phone tier (mid-tier / budget)
Measure cold-start load time (<2s target)
Measure inference latency (<500ms per classification)

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support