Runeward logo

Runeward — Türkçe KVKK Privacy Filter

Runeward, Türkçe metinlerde kişisel veri ve KVKK kapsamında hassas olabilecek ifadeleri tespit etmek, maskelemek ve veri güvenliği süreçlerinde kullanmak üzere fine-tune edilmiş bir token-classification modelidir.

İsmi, runik mühür ve muhafızlık fikrinden ilham alır: hassas veriyi görünür olmadan önce işaretler, sınırlar ve korur.

This is an experimental v0.1 checkpoint fine-tuned from openai/privacy-filter.


Model Summary

Runeward is a Turkish KVKK-aware privacy filter model designed to detect and redact personally identifiable information and selected sensitive personal data categories in Turkish text.

It is intended for:

  • Turkish PII detection
  • KVKK-aware redaction pipelines
  • RAG/document sanitization
  • Pre-processing and post-processing guardrails
  • On-prem privacy filtering experiments
  • Support ticket, log, and CRM text redaction

Base Model

This model is fine-tuned from:

openai/privacy-filter

Runeward is not a generative LLM. It is a span/entity detection model for privacy filtering.


Label Space

The current custom label space is:

private_person
private_email
private_phone
private_address
private_date
private_url
account_number
secret
tckn
iban
tax_number
passport_number
license_plate
credit_card
health_data
biometric_data
genetic_data
religion_or_belief
political_opinion
union_membership
criminal_record
child_data

Training Summary

This v0.1 checkpoint was trained on synthetic Turkish KVKK/PII examples.

Observed training log:

train_examples=4330
validation_examples=541
epochs=1
train_loss=0.657055
validation_loss=0.487073
train_token_accuracy=0.8828
validation_token_accuracy=0.9097
best_epoch=1

Output head rebuild summary:

rebuilt output head for target label space
target labels=89
copied_rows=89
exact=33
fallback=56

The custom KVKK-specific labels such as tckn, iban, health_data, and license_plate are learned from the fine-tuning data.


Quick Test Results

TCKN

Input:

Müşteri Ahmet Yılmaz için TCKN 91234567890 sisteme kaydedildi.

Output:

Müşteri <PRIVATE_PERSON> için TCKN <TCKN> sisteme kaydedildi.

IBAN

Input:

Para iadesi için IBAN: TR123456789012345678901234

Output:

Para iadesi için IBAN: <IBAN>

Phone and Email

Input:

Ayşe Demir'in telefonu 0532 123 45 67 ve e-postası ayse.demo@example.com.

Output:

Ayşe Demir'in telefonu <PRIVATE_PHONE> ve e-postası <PRIVATE_EMAIL>.

License Plate

Input:

Araç plakası 34 ABC 123 olarak kaydedilmiş.

Output:

Araç plakası <LICENSE_PLATE> olarak kaydedilmiş.

Negative Example

Input:

Bu metinde herhangi bir kişisel veri yok, sadece ürün açıklaması var.

Output:

Bu metinde herhangi bir kişisel veri yok, sadece ürün açıklaması var.

Known Limitations

This is an early experimental checkpoint.

Known issues observed in v0.1:

  • Semantic sensitive categories such as health_data, political_opinion, and union_membership require more diverse training data.
  • Lowercase Turkish IBAN examples may be confused with other numeric labels.
  • Secret/API-key-like strings may be confused with license_plate in some cases.
  • Special-category KVKK labels need stronger semantic coverage.
  • The model was trained primarily on synthetic data.
  • This model should not be used as the sole KVKK compliance mechanism.

For production systems, use Runeward with:

  • deterministic regex validators,
  • TCKN/IBAN checksum validation,
  • policy engine,
  • audit logging,
  • output scanning,
  • human review for special-category data.

Recommended production architecture:

Regex validators
+ Runeward span detection
+ KVKK policy engine
+ output redaction
+ audit logging

Usage

After installing the OpenAI Privacy Filter CLI, you can run:

opf --checkpoint ./runeward-kvkk-filter "Müşteri Ahmet Yılmaz için TCKN 91234567890 sisteme kaydedildi."

Expected output:

Müşteri <PRIVATE_PERSON> için TCKN <TCKN> sisteme kaydedildi.

Example Production Policy Layer

Runeward should be used as a detection layer, not as a complete compliance system.

Example policy:

SPECIAL_CATEGORY_LABELS = {
    "health_data",
    "biometric_data",
    "genetic_data",
    "religion_or_belief",
    "political_opinion",
    "union_membership",
    "criminal_record",
    "child_data",
}

HIGH_RISK_MASK_LABELS = {
    "tckn",
    "iban",
    "credit_card",
    "passport_number",
    "tax_number",
    "secret",
}

def decide_kvkk_policy(spans):
    labels = {span["label"] for span in spans}

    if labels & SPECIAL_CATEGORY_LABELS:
        return "HUMAN_REVIEW"

    if labels & HIGH_RISK_MASK_LABELS:
        return "MASK"

    if len(spans) >= 5:
        return "BLOCK"

    if spans:
        return "MASK"

    return "ALLOW"

Intended Use

Runeward is intended for research, experimentation, and internal privacy-filtering pipelines for Turkish text.

Good use cases:

  • PII masking before sending text to an LLM
  • RAG document sanitization
  • Logs and support-ticket redaction
  • KVKK-aware pre-processing pipelines
  • On-prem privacy filtering experiments

Out-of-Scope Use

Do not use this model as a standalone legal compliance solution.

It does not replace:

  • legal review,
  • explicit consent flows,
  • data inventory processes,
  • KVKK governance,
  • human review workflows,
  • access control,
  • data retention policies.

Safety and Compliance Notice

Runeward helps identify and mask potential personal data, but it does not guarantee KVKK compliance.

For real production deployments, combine it with:

1. deterministic PII scanners,
2. legal/policy review,
3. role-based access control,
4. audit logs,
5. human review,
6. explicit retention and deletion policies.

Version

Runeward v0.1.0

This is the first experimental checkpoint.

Recommended next steps for v0.2:

  • Add more LLM-assisted synthetic examples.
  • Add hard negatives.
  • Add lowercase/spaced IBAN variants.
  • Improve secret/API-key detection.
  • Improve semantic special-category detection.
  • Evaluate with label-level precision, recall, and F1.
  • Add Turkish real-world anonymized validation examples.

License

Apache 2.0.


Citation

If you use this model, please cite the base model and this fine-tuned checkpoint.

@misc{runeward2026,
  title={Runeward: Turkish KVKK-aware Privacy Filter},
  author={Curiosity Technology},
  year={2026},
  howpublished={Hugging Face model checkpoint},
  note={Fine-tuned from openai/privacy-filter}
}
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for curiositytech/runeward-kvkk-filter

Finetuned
(32)
this model