SafeLog-LM

A fine-tuned Gemma 3 270M model specialized in redacting Personally Identifiable Information (PII) from Android system logs (logcat, bugreports, dmesg).

Model Details

  • Base Model: google/gemma-3-270m-it
  • Task: PII Redaction
  • Domain: Android system logs
  • Parameters: 270M
  • Training: QLoRA fine-tuning with completion-only loss masking

Supported PII Types

Category Tags
Device IDs [IMEI], [SERIAL_NUMBER], [ANDROID_ID], [ADVERTISING_ID], [MAC_ADDRESS], [ICCID], [IMSI]
Network/Location [IP_ADDRESS], [GPS_COORDINATES], [WIFI_SSID], [BLUETOOTH_NAME]
Personal Info [PERSON_NAME], [EMAIL], [PHONE_NUMBER], [ACCOUNT_NAME]
Paths [PATH_USERNAME]
Secrets [ACCESS_TOKEN], [CERTIFICATE_FINGERPRINT]
URLs [URL_WITH_PII]

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "logcat-ai/safelog-lm"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="cuda"
)

SYSTEM_PROMPT = (
    "You are a PII redaction tool. Replace all personally identifiable information "
    "in the input text with the appropriate tag. Preserve all other text exactly. "
    "Tags: [IMEI], [SERIAL_NUMBER], [ANDROID_ID], [ADVERTISING_ID], [MAC_ADDRESS], "
    "[ICCID], [IMSI], [IP_ADDRESS], [GPS_COORDINATES], [WIFI_SSID], [BLUETOOTH_NAME], "
    "[PERSON_NAME], [EMAIL], [PHONE_NUMBER], [ACCOUNT_NAME], [PATH_USERNAME], "
    "[ACCESS_TOKEN], [CERTIFICATE_FINGERPRINT], [URL_WITH_PII]"
)

def redact(text):
    prompt = f"""<start_of_turn>user
{SYSTEM_PROMPT}

Input: {text}<end_of_turn>
<start_of_turn>model
"""
    inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
    outputs = model.generate(**inputs, max_new_tokens=512, do_sample=False)
    return tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True).strip()

# Example
log_line = "I/TelephonyManager: getDeviceId() returning 358673091234567"
print(redact(log_line))
# Output: I/TelephonyManager: getDeviceId() returning [IMEI]

Examples

Input Output
D/Telephony: Sending SMS to +917514605529 D/Telephony: Sending SMS to [PHONE_NUMBER]
D/WifiService: Connected to SSID "Varun's Home 5G" with MAC aa:bb:cc:dd:ee:ff D/WifiService: Connected to SSID "[WIFI_SSID]" with MAC [MAC_ADDRESS]
W/Auth: Bearer token=eyJhbGciOiJIUzI1... W/Auth: Bearer token=[ACCESS_TOKEN]
I/SyncService: Syncing user priya@company.com from IP 192.168.1.105 I/SyncService: Syncing user [EMAIL] from IP [IP_ADDRESS]
D/ActivityManager: Process com.app started D/ActivityManager: Process com.app started (no PII)

Training Details

  • Dataset: ~1,770 examples (real Android logs + synthetic data)
  • Epochs: 5
  • Method: QLoRA (r=32, alpha=64)
  • Loss: Completion-only (prompt masked)
  • Final eval accuracy: 99.7%

Limitations

  • Optimized for Android log formats; may not generalize perfectly to other log types
  • Very long log lines (>512 tokens) may be truncated
  • Some PII categories have fewer training examples than others

Intended Use

  • Pre-processing Android logs before sharing or analysis
  • Privacy-preserving log collection in device management systems
  • Client-side PII redaction in observability pipelines

Author

Built by logcat.ai — AI-powered observability for Android and Linux system-level intelligence.

License

This model inherits the Gemma license.

Downloads last month
21
Safetensors
Model size
0.3B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for logcat-ai/safelog-lm

Finetuned
(1024)
this model