Instructions to use tomwrx/privacy-masker-gemma4-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use tomwrx/privacy-masker-gemma4-lora with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("google/gemma-4-E4B") model = PeftModel.from_pretrained(base_model, "tomwrx/privacy-masker-gemma4-lora") - Notebooks
- Google Colab
- Kaggle
Privacy Masker - Gemma-4-E4B LoRA Adapter
LoRA adapter for PII (Personally Identifiable Information) detection on mobile UI screenshots. Stage 2 of a two-stage pipeline (PaddleOCR -> Gemma classifier).
Team: Aurimas B啪臈skis 路 Tomas Stankevicius 路 Aida Katkauskait臈 Course: VU Deep Learning, 2026
Intended use
Given a list of OCR text regions extracted from a mobile screenshot, classify each
region as one of 9 PII classes or null. The model expects a layout-aware prompt
of the form [index@x,y] "text" with coordinates normalized to a 1000x1000 grid.
Training
- Base model:
google/gemma-4-E4B(4-bit NF4) - LoRA: r=32, 伪=64, dropout=0.05, language-model projections only
- Trainable parameters: 69.8M (0.87%)
- Dataset:
pii_v5(9,989 mobile UI screenshots from RICO-ScreenQA) - Optimizer: AdamW, cosine LR schedule, bf16
Results (pii_v5 test split, 997 screens)
| Metric | Value |
|---|---|
| Micro F1 | 0.586 |
| Macro F1 | 0.536 |
| JSON validity | 100% |
Per-class F1: email_address 0.84 路 phone_number 0.59 路 address 0.56 路
full_name 0.56 路 account_balance 0.56 路 transaction_amount 0.55 路
username 0.47 路 other_sensitive 0.42 路 date_of_birth 0.28.
How to use
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel
bnb = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16)
base = AutoModelForCausalLM.from_pretrained("google/gemma-4-E4B",
quantization_config=bnb,
device_map="auto")
model = PeftModel.from_pretrained(base, "tomasstankevicius/privacy-masker-gemma4-lora")
tok = AutoTokenizer.from_pretrained("tomasstankevicius/privacy-masker-gemma4-lora")
See the project repo for the full inference pipeline (PaddleOCR + prompt construction + JSON parsing)
Limitations
- OCR ceiling: bounded above by PaddleOCR recall (misses low-contrast text, icons).
date_of_birthregresses on pii_v5 due to label noise (mixes date strings with age integers).other_sensitiveis structurally weak (mixes biometrics, credentials, demographics).- Trained on English UIs only.
License
Inherits the Gemma license from the base model.
- Downloads last month
- 33
Model tree for tomwrx/privacy-masker-gemma4-lora
Base model
google/gemma-4-E4B