KlarKI β€” EU AI Act Article Domain Classifier

8-class text classification β€” maps document chunks to EU AI Act article domains (Articles 9–15 + unrelated)

Part of KlarKI β€” a local-first EU AI Act + GDPR compliance auditor for German SMEs. All inference runs on-device. No data leaves your machine.


Model Overview

Property Value
Base model deepset/gbert-base
Architecture Transformers β€” BertForSequenceClassification
Parameters ~110M parameters
Languages German (primary), English
Training samples 5536 train / 981 validation
License MIT
Part of KlarKI audit pipeline

Quickstart

Option A β€” Via KlarKI (recommended)

Use this if you want the full audit pipeline. The download script places all 5 models exactly where KlarKI expects them.

git clone https://github.com/s4nkar/KlarKI-EU-AI-Act-compliance-auditor.git
cd KlarKI-EU-AI-Act-compliance-auditor
pip install huggingface-hub>=0.26.0
python scripts/download_pretrained.py --model bert
./run.sh up

Option B β€” Direct usage

from transformers import pipeline

classifier = pipeline("text-classification", model="s4nkar/klarki-bert-classifier")
result = classifier("The system must maintain a risk management system throughout the entire lifecycle of the AI system.")
# Output: [{'label': 'risk_management', 'score': 0.97}]

Labels

Label Description
risk_management Article 9 β€” Risk Management System
data_governance Article 10 β€” Data and Data Governance
technical_documentation Article 11 β€” Technical Documentation
record_keeping Article 12 β€” Record-Keeping
transparency Article 13 β€” Transparency and Provision of Information
human_oversight Article 14 β€” Human Oversight
security Article 15 β€” Accuracy, Robustness and Cybersecurity
unrelated Not related to EU AI Act Articles 9–15

Evaluation Results

Overall

Macro F1 Val samples
0.9540 981

Per-Class

Class Precision Recall F1 Support
risk_management 0.9435 0.9512 0.9474 123
data_governance 0.9593 0.9672 0.9633 122
technical_documentation 0.9680 0.9680 0.9680 125
record_keeping 0.9583 0.9426 0.9504 122
transparency 0.9569 0.8952 0.9250 124
human_oversight 0.9365 0.9672 0.9516 122
security 0.9516 0.9593 0.9555 123
unrelated 0.9593 0.9833 0.9712 120

Training Details

Property Value
Base model deepset/gbert-base
Training epochs 5 (AdamW, early stopping)
Batch size 16
Data split 85% train / 15% validation, stratified, seed=42
Data generation Async Ollama-grounded synthesis (phi3:mini) + real regulatory text
Optimiser AdamW
Training framework Docker container (Python 3.11, isolated from host)

Intended Use

Routing document chunks to the correct article gap analyser inside the KlarKI audit pipeline. Each 512-character chunk is assigned to one of seven article domains or marked unrelated.

This model is a decision-support tool, not a substitute for qualified legal advice. EU AI Act compliance determinations should always be reviewed by a legal professional.


Limitations

  • Trained primarily on German regulatory text; performance may degrade on highly informal language.
  • unrelated is a catch-all class; very short or ambiguous chunks may be misclassified.
  • Designed for 512-character chunks, not full documents.

Citation

@software{klarki2026,
  author    = {Sankar},
  title     = {KlarKI: Local-First EU AI Act and GDPR Compliance Auditor},
  year      = {2026},
  url       = {https://github.com/s4nkar/KlarKI-EU-AI-Act-compliance-auditor},
  note      = {Open-source compliance tooling for German SMEs}
}

About KlarKI

KlarKI is an open-source, local-first EU AI Act + GDPR compliance auditor built for German SMEs. Upload a policy document and receive a scored gap analysis against Articles 9–15 entirely on your own hardware.

Key features:

  • Deterministic legal decision hierarchy (actor detection, Annex III applicability gate)
  • Hybrid RAG retrieval (BM25 + ChromaDB vector + cross-encoder re-ranking)
  • LangGraph multi-agent gap analysis (3-node per applicable article)
  • Bilingual EN/DE support β€” all inference runs locally, no external API calls

GitHub  |  All KlarKI Models

Downloads last month
34
Safetensors
Model size
0.1B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for s4nkar/klarki-bert-classifier

Finetuned
(78)
this model

Evaluation results

  • Macro F1 on KlarKI EU AI Act Regulatory Training Data
    self-reported
    0.954