Runeward — Türkçe KVKK Privacy Filter

Runeward, Türkçe metinlerde kişisel veri ve KVKK kapsamında hassas olabilecek ifadeleri tespit etmek, maskelemek ve veri güvenliği süreçlerinde kullanmak üzere fine-tune edilmiş bir token-classification modelidir.

İsmi, runik mühür ve muhafızlık fikrinden ilham alır: hassas veriyi görünür olmadan önce işaretler, sınırlar ve korur.

This is an experimental v0.1 checkpoint fine-tuned from openai/privacy-filter.

Model Summary

Runeward is a Turkish KVKK-aware privacy filter model designed to detect and redact personally identifiable information and selected sensitive personal data categories in Turkish text.

It is intended for:

Turkish PII detection
KVKK-aware redaction pipelines
RAG/document sanitization
Pre-processing and post-processing guardrails
On-prem privacy filtering experiments
Support ticket, log, and CRM text redaction

Base Model

This model is fine-tuned from:

openai/privacy-filter

Runeward is not a generative LLM. It is a span/entity detection model for privacy filtering.

Label Space

The current custom label space is:

private_person
private_email
private_phone
private_address
private_date
private_url
account_number
secret
tckn
iban
tax_number
passport_number
license_plate
credit_card
health_data
biometric_data
genetic_data
religion_or_belief
political_opinion
union_membership
criminal_record
child_data

Training Summary

This v0.1 checkpoint was trained on synthetic Turkish KVKK/PII examples.

Observed training log:

train_examples=4330
validation_examples=541
epochs=1
train_loss=0.657055
validation_loss=0.487073
train_token_accuracy=0.8828
validation_token_accuracy=0.9097
best_epoch=1

Output head rebuild summary:

rebuilt output head for target label space
target labels=89
copied_rows=89
exact=33
fallback=56

The custom KVKK-specific labels such as tckn, iban, health_data, and license_plate are learned from the fine-tuning data.

Quick Test Results

TCKN

Input:

Müşteri Ahmet Yılmaz için TCKN 91234567890 sisteme kaydedildi.

Output:

Müşteri <PRIVATE_PERSON> için TCKN <TCKN> sisteme kaydedildi.

IBAN

Input:

Para iadesi için IBAN: TR123456789012345678901234

Output:

Para iadesi için IBAN: <IBAN>

Phone and Email

Input:

Ayşe Demir'in telefonu 0532 123 45 67 ve e-postası ayse.demo@example.com.

Output:

Ayşe Demir'in telefonu <PRIVATE_PHONE> ve e-postası <PRIVATE_EMAIL>.

License Plate

Input:

Araç plakası 34 ABC 123 olarak kaydedilmiş.

Output:

Araç plakası <LICENSE_PLATE> olarak kaydedilmiş.

Negative Example

Input:

Bu metinde herhangi bir kişisel veri yok, sadece ürün açıklaması var.

Output:

Bu metinde herhangi bir kişisel veri yok, sadece ürün açıklaması var.

Known Limitations

This is an early experimental checkpoint.

Known issues observed in v0.1:

Semantic sensitive categories such as health_data, political_opinion, and union_membership require more diverse training data.
Lowercase Turkish IBAN examples may be confused with other numeric labels.
Secret/API-key-like strings may be confused with license_plate in some cases.
Special-category KVKK labels need stronger semantic coverage.
The model was trained primarily on synthetic data.
This model should not be used as the sole KVKK compliance mechanism.

For production systems, use Runeward with:

deterministic regex validators,
TCKN/IBAN checksum validation,
policy engine,
audit logging,
output scanning,
human review for special-category data.

Recommended production architecture:

Regex validators
+ Runeward span detection
+ KVKK policy engine
+ output redaction
+ audit logging

Usage

After installing the OpenAI Privacy Filter CLI, you can run:

opf --checkpoint ./runeward-kvkk-filter "Müşteri Ahmet Yılmaz için TCKN 91234567890 sisteme kaydedildi."

Expected output:

Müşteri <PRIVATE_PERSON> için TCKN <TCKN> sisteme kaydedildi.

Example Production Policy Layer

Runeward should be used as a detection layer, not as a complete compliance system.

Example policy:

SPECIAL_CATEGORY_LABELS = {
    "health_data",
    "biometric_data",
    "genetic_data",
    "religion_or_belief",
    "political_opinion",
    "union_membership",
    "criminal_record",
    "child_data",
}

HIGH_RISK_MASK_LABELS = {
    "tckn",
    "iban",
    "credit_card",
    "passport_number",
    "tax_number",
    "secret",
}

def decide_kvkk_policy(spans):
    labels = {span["label"] for span in spans}

    if labels & SPECIAL_CATEGORY_LABELS:
        return "HUMAN_REVIEW"

    if labels & HIGH_RISK_MASK_LABELS:
        return "MASK"

    if len(spans) >= 5:
        return "BLOCK"

    if spans:
        return "MASK"

    return "ALLOW"

Intended Use

Runeward is intended for research, experimentation, and internal privacy-filtering pipelines for Turkish text.

Good use cases:

PII masking before sending text to an LLM
RAG document sanitization
Logs and support-ticket redaction
KVKK-aware pre-processing pipelines
On-prem privacy filtering experiments

Out-of-Scope Use

Do not use this model as a standalone legal compliance solution.

It does not replace:

legal review,
explicit consent flows,
data inventory processes,
KVKK governance,
human review workflows,
access control,
data retention policies.

Safety and Compliance Notice

Runeward helps identify and mask potential personal data, but it does not guarantee KVKK compliance.

For real production deployments, combine it with:

1. deterministic PII scanners,
2. legal/policy review,
3. role-based access control,
4. audit logs,
5. human review,
6. explicit retention and deletion policies.

Version

Runeward v0.1.0

This is the first experimental checkpoint.

Recommended next steps for v0.2:

Add more LLM-assisted synthetic examples.
Add hard negatives.
Add lowercase/spaced IBAN variants.
Improve secret/API-key detection.
Improve semantic special-category detection.
Evaluate with label-level precision, recall, and F1.
Add Turkish real-world anonymized validation examples.

License

Apache 2.0.

Citation

If you use this model, please cite the base model and this fine-tuned checkpoint.

@misc{runeward2026,
  title={Runeward: Turkish KVKK-aware Privacy Filter},
  author={Curiosity Technology},
  year={2026},
  howpublished={Hugging Face model checkpoint},
  note={Fine-tuned from openai/privacy-filter}
}

Downloads last month: 3

Safetensors

Model size

1B params

Tensor type

BF16

Model tree for curiositytech/runeward-kvkk-filter

Base model

openai/privacy-filter

Finetuned

(45)

this model