KlarKI — EU AI Act High-Risk Binary Classifier

Binary classification — detects whether a document describes a high-risk AI system under Article 6 and Annex III

Part of KlarKI — a local-first EU AI Act + GDPR compliance auditor for German SMEs. All inference runs on-device. No data leaves your machine.

Model Overview

Property	Value
Base model	deepset/gbert-base
Architecture	Transformers — `BertForSequenceClassification`
Parameters	~110M parameters
Languages	German (primary), English
Training samples	1412 train / 250 validation
License	MIT
Part of	KlarKI audit pipeline

Quickstart

Option A — Via KlarKI (recommended)

Use this if you want the full audit pipeline. The download script places all 5 models exactly where KlarKI expects them.

git clone https://github.com/s4nkar/KlarKI-EU-AI-Act-compliance-auditor.git
cd KlarKI-EU-AI-Act-compliance-auditor
pip install huggingface-hub>=0.26.0
python scripts/download_pretrained.py --model risk
./run.sh up

Option B — Direct usage

from transformers import pipeline

classifier = pipeline("text-classification", model="s4nkar/klarki-risk-classifier")
result = classifier("The system is used for recruitment and selection of persons, including screening of CVs.")
# Output: [{'label': 'high_risk', 'score': 0.98}]

Labels

Label	Description
`high_risk`	AI system falls under Annex III high-risk categories (Art. 6)
`not_high_risk`	AI system does not appear to be high-risk under Annex III

Evaluation Results

Overall

Macro F1	Val samples
0.9920	250

Per-Class

Class	Precision	Recall	F1	Support
`high_risk`	0.9922	0.9922	0.9922	128
`not_high_risk`	0.9918	0.9918	0.9918	122

Training Details

Property	Value
Base model	`deepset/gbert-base`
Training epochs	5 (AdamW, early stopping)
Batch size	16
Data split	85% train / 15% validation, stratified, seed=42
Data generation	Async Ollama-grounded synthesis (phi3:mini) + real regulatory text
Optimiser	AdamW
Training framework	Docker container (Python 3.11, isolated from host)

Intended Use

Augmenting KlarKI's deterministic Annex III applicability gate when pattern matching produces uncertain results. Triggers at confidence >= 0.85 and catches Annex III cases that regex patterns missed.

This model is a decision-support tool, not a substitute for qualified legal advice. EU AI Act compliance determinations should always be reviewed by a legal professional.

Limitations

Binary signal only; does not identify which Annex III category triggered the result.
Always used alongside the deterministic pattern engine in KlarKI, never standalone.
Confidence threshold is 0.85; borderline cases fall back to patterns.

Citation

@software{klarki2026,
  author    = {Sankar},
  title     = {KlarKI: Local-First EU AI Act and GDPR Compliance Auditor},
  year      = {2026},
  url       = {https://github.com/s4nkar/KlarKI-EU-AI-Act-compliance-auditor},
  note      = {Open-source compliance tooling for German SMEs}
}

About KlarKI

KlarKI is an open-source, local-first EU AI Act + GDPR compliance auditor built for German SMEs. Upload a policy document and receive a scored gap analysis against Articles 9–15 entirely on your own hardware.

Key features:

Deterministic legal decision hierarchy (actor detection, Annex III applicability gate)
Hybrid RAG retrieval (BM25 + ChromaDB vector + cross-encoder re-ranking)
LangGraph multi-agent gap analysis (3-node per applicable article)
Bilingual EN/DE support — all inference runs locally, no external API calls

GitHub | All KlarKI Models

Downloads last month: 39

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for s4nkar/klarki-risk-classifier

Base model

deepset/gbert-base

Finetuned

(78)

this model

Evaluation results

Macro F1 on KlarKI EU AI Act Regulatory Training Data
self-reported

0.992