KlarKI โ€” EU AI Act High-Risk Binary Classifier

Binary classification โ€” detects whether a document describes a high-risk AI system under Article 6 and Annex III

Part of KlarKI โ€” a local-first EU AI Act + GDPR compliance auditor for German SMEs. All inference runs on-device. No data leaves your machine.


Model Overview

Property Value
Base model deepset/gbert-base
Architecture Transformers โ€” BertForSequenceClassification
Parameters ~110M parameters
Languages German (primary), English
Training samples 1412 train / 250 validation
License MIT
Part of KlarKI audit pipeline

Quickstart

Option A โ€” Via KlarKI (recommended)

Use this if you want the full audit pipeline. The download script places all 5 models exactly where KlarKI expects them.

git clone https://github.com/s4nkar/KlarKI-EU-AI-Act-compliance-auditor.git
cd KlarKI-EU-AI-Act-compliance-auditor
pip install huggingface-hub>=0.26.0
python scripts/download_pretrained.py --model risk
./run.sh up

Option B โ€” Direct usage

from transformers import pipeline

classifier = pipeline("text-classification", model="s4nkar/klarki-risk-classifier")
result = classifier("The system is used for recruitment and selection of persons, including screening of CVs.")
# Output: [{'label': 'high_risk', 'score': 0.98}]

Labels

Label Description
high_risk AI system falls under Annex III high-risk categories (Art. 6)
not_high_risk AI system does not appear to be high-risk under Annex III

Evaluation Results

Overall

Macro F1 Val samples
0.9920 250

Per-Class

Class Precision Recall F1 Support
high_risk 0.9922 0.9922 0.9922 128
not_high_risk 0.9918 0.9918 0.9918 122

Training Details

Property Value
Base model deepset/gbert-base
Training epochs 5 (AdamW, early stopping)
Batch size 16
Data split 85% train / 15% validation, stratified, seed=42
Data generation Async Ollama-grounded synthesis (phi3:mini) + real regulatory text
Optimiser AdamW
Training framework Docker container (Python 3.11, isolated from host)

Intended Use

Augmenting KlarKI's deterministic Annex III applicability gate when pattern matching produces uncertain results. Triggers at confidence >= 0.85 and catches Annex III cases that regex patterns missed.

This model is a decision-support tool, not a substitute for qualified legal advice. EU AI Act compliance determinations should always be reviewed by a legal professional.


Limitations

  • Binary signal only; does not identify which Annex III category triggered the result.
  • Always used alongside the deterministic pattern engine in KlarKI, never standalone.
  • Confidence threshold is 0.85; borderline cases fall back to patterns.

Citation

@software{klarki2026,
  author    = {Sankar},
  title     = {KlarKI: Local-First EU AI Act and GDPR Compliance Auditor},
  year      = {2026},
  url       = {https://github.com/s4nkar/KlarKI-EU-AI-Act-compliance-auditor},
  note      = {Open-source compliance tooling for German SMEs}
}

About KlarKI

KlarKI is an open-source, local-first EU AI Act + GDPR compliance auditor built for German SMEs. Upload a policy document and receive a scored gap analysis against Articles 9โ€“15 entirely on your own hardware.

Key features:

  • Deterministic legal decision hierarchy (actor detection, Annex III applicability gate)
  • Hybrid RAG retrieval (BM25 + ChromaDB vector + cross-encoder re-ranking)
  • LangGraph multi-agent gap analysis (3-node per applicable article)
  • Bilingual EN/DE support โ€” all inference runs locally, no external API calls

GitHub  |  All KlarKI Models

Downloads last month
39
Safetensors
Model size
0.1B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for s4nkar/klarki-risk-classifier

Finetuned
(78)
this model

Evaluation results

  • Macro F1 on KlarKI EU AI Act Regulatory Training Data
    self-reported
    0.992