YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

GrandgemMa β€” Gemma 4 Scam Detection Eval & Fine-Tune Kit

Goal: Test google/gemma-4-E2B-it (2B params) on real scam-call transcripts.
If accuracy < 90 % or F1(SCAM) < 85 % β†’ fine-tune with Unsloth 4-bit LoRA, then convert to LiteRT for phone.

Model Size Reference

Model Params FP32 RAM 4-bit LiteRT RAM Phone?
gemma-4-31B-it 31B ~124 GB ~16 GB ❌ No
gemma-4-26B-A4B-it 26B ~104 GB ~13 GB ❌ No
gemma-4-E4B-it 4B ~16 GB ~2 GB ⚠️ Flagship only
gemma-4-E2B-it 2B ~8 GB ~1.5 GB βœ… Mid-tier + budget

We use gemma-4-E2B-it (2B) β€” smallest Gemma 4, fits on most phones after LiteRT quantization.

Datasets

Quick Start

Step 1: Zero-shot eval (CPU, no GPU needed)

# Quick test β€” 20 samples, ~2-3 min on laptop CPU
python eval_zero_shot_cpu.py --limit 20

# Full test split β€” ~400 samples, ~30-45 min on CPU
python eval_zero_shot_cpu.py --limit -1

# If you have plenty of RAM, use fp16 to halve memory (~4 GB)
python eval_zero_shot_cpu.py --limit 20 --dtype fp16

Output: results_zero_shot_cpu.json + console report.

Step 2: Read the verdict

Accuracy F1(SCAM) Verdict Action
β‰₯ 90 % β‰₯ 85 % βœ… PASS Base model good. Go straight to LiteRT conversion.
75–89 % 70–84 % ⚠️ MARGINAL Fine-tune, then LiteRT convert.
< 75 % < 70 % ❌ FAIL Fine-tune REQUIRED before phone deployment.

Step 3: Fine-tune (if needed)

# Install
pip install unsloth transformers datasets trl peft accelerate

# Train on GPU (Kaggle T4Γ—2 free, or Colab, or local GPU)
python train_sft_unsloth.py --push_to_hub s23deepak/grandgemma-scam-sft

# Then re-eval the fine-tuned model
python eval_zero_shot_cpu.py \
  --model s23deepak/grandgemma-scam-sft \
  --limit -1

Step 4: Convert to LiteRT for Android

After fine-tuning (or if base passes), convert the 2B model to .litertlm:

# Use litert-community tools
pip install litert

litert-convert \
  --model s23deepak/grandgemma-scam-sft \
  --output grandgemma-scam.litertlm \
  --quantization int4

Target RAM on phone: ~1.5 GB for the 2B 4-bit model.

Files in This Repo

File Purpose
eval_zero_shot_cpu.py CPU-only zero-shot eval (default, no GPU)
eval_zero_shot.py GPU version (faster, same logic)
train_sft_unsloth.py Unsloth 4-bit LoRA fine-tune
format_dataset.py Convert dataset β†’ ChatML JSONL

Phone Deployment Checklist

  • Zero-shot eval passes (β‰₯90% acc, β‰₯85% F1)
  • OR fine-tuned model passes same threshold
  • Convert to .litertlm (int4 quantization)
  • Benchmark on target phone tier (mid-tier / budget)
  • Measure cold-start load time (<2s target)
  • Measure inference latency (<500ms per classification)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support