SafeLog-LM
A fine-tuned Gemma 3 270M model specialized in redacting Personally Identifiable Information (PII) from Android system logs (logcat, bugreports, dmesg).
Model Details
- Base Model: google/gemma-3-270m-it
- Task: PII Redaction
- Domain: Android system logs
- Parameters: 270M
- Training: QLoRA fine-tuning with completion-only loss masking
Supported PII Types
| Category | Tags |
|---|---|
| Device IDs | [IMEI], [SERIAL_NUMBER], [ANDROID_ID], [ADVERTISING_ID], [MAC_ADDRESS], [ICCID], [IMSI] |
| Network/Location | [IP_ADDRESS], [GPS_COORDINATES], [WIFI_SSID], [BLUETOOTH_NAME] |
| Personal Info | [PERSON_NAME], [EMAIL], [PHONE_NUMBER], [ACCOUNT_NAME] |
| Paths | [PATH_USERNAME] |
| Secrets | [ACCESS_TOKEN], [CERTIFICATE_FINGERPRINT] |
| URLs | [URL_WITH_PII] |
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "logcat-ai/safelog-lm"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="cuda"
)
SYSTEM_PROMPT = (
"You are a PII redaction tool. Replace all personally identifiable information "
"in the input text with the appropriate tag. Preserve all other text exactly. "
"Tags: [IMEI], [SERIAL_NUMBER], [ANDROID_ID], [ADVERTISING_ID], [MAC_ADDRESS], "
"[ICCID], [IMSI], [IP_ADDRESS], [GPS_COORDINATES], [WIFI_SSID], [BLUETOOTH_NAME], "
"[PERSON_NAME], [EMAIL], [PHONE_NUMBER], [ACCOUNT_NAME], [PATH_USERNAME], "
"[ACCESS_TOKEN], [CERTIFICATE_FINGERPRINT], [URL_WITH_PII]"
)
def redact(text):
prompt = f"""<start_of_turn>user
{SYSTEM_PROMPT}
Input: {text}<end_of_turn>
<start_of_turn>model
"""
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=512, do_sample=False)
return tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True).strip()
# Example
log_line = "I/TelephonyManager: getDeviceId() returning 358673091234567"
print(redact(log_line))
# Output: I/TelephonyManager: getDeviceId() returning [IMEI]
Examples
| Input | Output |
|---|---|
D/Telephony: Sending SMS to +917514605529 |
D/Telephony: Sending SMS to [PHONE_NUMBER] |
D/WifiService: Connected to SSID "Varun's Home 5G" with MAC aa:bb:cc:dd:ee:ff |
D/WifiService: Connected to SSID "[WIFI_SSID]" with MAC [MAC_ADDRESS] |
W/Auth: Bearer token=eyJhbGciOiJIUzI1... |
W/Auth: Bearer token=[ACCESS_TOKEN] |
I/SyncService: Syncing user priya@company.com from IP 192.168.1.105 |
I/SyncService: Syncing user [EMAIL] from IP [IP_ADDRESS] |
D/ActivityManager: Process com.app started |
D/ActivityManager: Process com.app started (no PII) |
Training Details
- Dataset: ~1,770 examples (real Android logs + synthetic data)
- Epochs: 5
- Method: QLoRA (r=32, alpha=64)
- Loss: Completion-only (prompt masked)
- Final eval accuracy: 99.7%
Limitations
- Optimized for Android log formats; may not generalize perfectly to other log types
- Very long log lines (>512 tokens) may be truncated
- Some PII categories have fewer training examples than others
Intended Use
- Pre-processing Android logs before sharing or analysis
- Privacy-preserving log collection in device management systems
- Client-side PII redaction in observability pipelines
Author
Built by logcat.ai — AI-powered observability for Android and Linux system-level intelligence.
License
This model inherits the Gemma license.
- Downloads last month
- 21