Privacy Masker - Gemma-4-E4B LoRA Adapter

LoRA adapter for PII (Personally Identifiable Information) detection on mobile UI screenshots. Stage 2 of a two-stage pipeline (PaddleOCR -> Gemma classifier).

Team: Aurimas Bžėskis · Tomas Stankevicius · Aida Katkauskaitė Course: VU Deep Learning, 2026

Intended use

Given a list of OCR text regions extracted from a mobile screenshot, classify each region as one of 9 PII classes or null. The model expects a layout-aware prompt of the form [index@x,y] "text" with coordinates normalized to a 1000x1000 grid.

Training

Base model: google/gemma-4-E4B (4-bit NF4)
LoRA: r=32, α=64, dropout=0.05, language-model projections only
Trainable parameters: 69.8M (0.87%)
Dataset: pii_v5 (9,989 mobile UI screenshots from RICO-ScreenQA)
Optimizer: AdamW, cosine LR schedule, bf16

Results (pii_v5 test split, 997 screens)

Metric	Value
Micro F1	0.586
Macro F1	0.536
JSON validity	100%

Per-class F1: email_address 0.84 · phone_number 0.59 · address 0.56 · full_name 0.56 · account_balance 0.56 · transaction_amount 0.55 · username 0.47 · other_sensitive 0.42 · date_of_birth 0.28.

How to use

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel

bnb = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4",
                        bnb_4bit_compute_dtype=torch.bfloat16)
base = AutoModelForCausalLM.from_pretrained("google/gemma-4-E4B",
                                            quantization_config=bnb,
                                            device_map="auto")
model = PeftModel.from_pretrained(base, "tomasstankevicius/privacy-masker-gemma4-lora")
tok = AutoTokenizer.from_pretrained("tomasstankevicius/privacy-masker-gemma4-lora")

See the project repo for the full inference pipeline (PaddleOCR + prompt construction + JSON parsing)

Limitations

OCR ceiling: bounded above by PaddleOCR recall (misses low-contrast text, icons).
date_of_birth regresses on pii_v5 due to label noise (mixes date strings with age integers).
other_sensitive is structurally weak (mixes biometrics, credentials, demographics).
Trained on English UIs only.

License

Inherits the Gemma license from the base model.

Downloads last month: 1

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tomwrx/privacy-masker-gemma4-lora

Base model

google/gemma-4-E4B

Adapter

(11)

this model