KlarKI — EU AI Act Article Domain Classifier

8-class text classification — maps document chunks to EU AI Act article domains (Articles 9–15 + unrelated)

Part of KlarKI — a local-first EU AI Act + GDPR compliance auditor for German SMEs. All inference runs on-device. No data leaves your machine.

Model Overview

Property	Value
Base model	deepset/gbert-base
Architecture	Transformers — `BertForSequenceClassification`
Parameters	~110M parameters
Languages	German (primary), English
Training samples	5536 train / 981 validation
License	MIT
Part of	KlarKI audit pipeline

Quickstart

Option A — Via KlarKI (recommended)

Use this if you want the full audit pipeline. The download script places all 5 models exactly where KlarKI expects them.

git clone https://github.com/s4nkar/KlarKI-EU-AI-Act-compliance-auditor.git
cd KlarKI-EU-AI-Act-compliance-auditor
pip install huggingface-hub>=0.26.0
python scripts/download_pretrained.py --model bert
./run.sh up

Option B — Direct usage

from transformers import pipeline

classifier = pipeline("text-classification", model="s4nkar/klarki-bert-classifier")
result = classifier("The system must maintain a risk management system throughout the entire lifecycle of the AI system.")
# Output: [{'label': 'risk_management', 'score': 0.97}]

Labels

Label	Description
`risk_management`	Article 9 — Risk Management System
`data_governance`	Article 10 — Data and Data Governance
`technical_documentation`	Article 11 — Technical Documentation
`record_keeping`	Article 12 — Record-Keeping
`transparency`	Article 13 — Transparency and Provision of Information
`human_oversight`	Article 14 — Human Oversight
`security`	Article 15 — Accuracy, Robustness and Cybersecurity
`unrelated`	Not related to EU AI Act Articles 9–15

Evaluation Results

Overall

Macro F1	Val samples
0.9540	981

Per-Class

Class	Precision	Recall	F1	Support
`risk_management`	0.9435	0.9512	0.9474	123
`data_governance`	0.9593	0.9672	0.9633	122
`technical_documentation`	0.9680	0.9680	0.9680	125
`record_keeping`	0.9583	0.9426	0.9504	122
`transparency`	0.9569	0.8952	0.9250	124
`human_oversight`	0.9365	0.9672	0.9516	122
`security`	0.9516	0.9593	0.9555	123
`unrelated`	0.9593	0.9833	0.9712	120

Training Details

Property	Value
Base model	`deepset/gbert-base`
Training epochs	5 (AdamW, early stopping)
Batch size	16
Data split	85% train / 15% validation, stratified, seed=42
Data generation	Async Ollama-grounded synthesis (phi3:mini) + real regulatory text
Optimiser	AdamW
Training framework	Docker container (Python 3.11, isolated from host)

Intended Use

Routing document chunks to the correct article gap analyser inside the KlarKI audit pipeline. Each 512-character chunk is assigned to one of seven article domains or marked unrelated.

This model is a decision-support tool, not a substitute for qualified legal advice. EU AI Act compliance determinations should always be reviewed by a legal professional.

Limitations

Trained primarily on German regulatory text; performance may degrade on highly informal language.
unrelated is a catch-all class; very short or ambiguous chunks may be misclassified.
Designed for 512-character chunks, not full documents.

Citation

@software{klarki2026,
  author    = {Sankar},
  title     = {KlarKI: Local-First EU AI Act and GDPR Compliance Auditor},
  year      = {2026},
  url       = {https://github.com/s4nkar/KlarKI-EU-AI-Act-compliance-auditor},
  note      = {Open-source compliance tooling for German SMEs}
}

About KlarKI

KlarKI is an open-source, local-first EU AI Act + GDPR compliance auditor built for German SMEs. Upload a policy document and receive a scored gap analysis against Articles 9–15 entirely on your own hardware.

Key features:

Deterministic legal decision hierarchy (actor detection, Annex III applicability gate)
Hybrid RAG retrieval (BM25 + ChromaDB vector + cross-encoder re-ranking)
LangGraph multi-agent gap analysis (3-node per applicable article)
Bilingual EN/DE support — all inference runs locally, no external API calls

GitHub | All KlarKI Models

Downloads last month: 1

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for s4nkar/klarki-bert-classifier

Base model

deepset/gbert-base

Finetuned

(79)

this model

Evaluation results

Macro F1 on KlarKI EU AI Act Regulatory Training Data
self-reported

0.954