InsureDocClassifier — Insurance Document Classification

Created by Bytical AI — AI agents that run insurance operations.

Model Description

InsureDocClassifier is a 12-class insurance document classifier built on ModernBERT-base. It automatically categorizes insurance documents into their correct type, enabling automated document routing, indexing, and processing in insurance operations.

New to These Hugging Face Repos?

INSUREOS is published as separate repos so each page stays focused:

This page (InsureDocClassifier) contains model artifacts and usage.
Full training code is in piyushptiwari/insureos-models.
Training data is in piyushptiwari/insureos-training-data.
Beginner explanation is in LEARN.md.

Document Classes (12)

ID	Document Type	Description
0	Policy Schedule	Policy details and coverage summary
1	Certificate of Insurance	Proof of insurance document
2	Claim Form	Insurance claim submission form
3	Loss Adjuster Report	Assessment report from loss adjuster
4	Bordereaux — Premium	Premium transaction records
5	Bordereaux — Claims	Claims transaction records
6	Endorsement	Policy amendment document
7	Renewal Notice	Policy renewal notification
8	Statement of Fact	Declaration of material facts
9	FNOL Report	First Notification of Loss report
10	Subrogation Notice	Recovery rights notification
11	Policy Wording	Full policy terms and conditions

Training Details

Parameter	Value
Base Model	answerdotai/ModernBERT-base
Training Samples	10,000 synthetic insurance documents
Epochs	5
Eval Loss	4.17e-06
GPU	NVIDIA Tesla T4 16GB

Evaluation Results

Metric	Score
Accuracy	1.0
F1 (macro)	1.0
F1 (weighted)	1.0
Eval Samples/sec	32.96

How to Use

from transformers import AutoModelForSequenceClassification, AutoTokenizer

model = AutoModelForSequenceClassification.from_pretrained("piyushptiwari/InsureDocClassifier")
tokenizer = AutoTokenizer.from_pretrained("piyushptiwari/InsureDocClassifier")

text = "We hereby confirm that the above-named insured holds a valid policy of insurance..."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
outputs = model(**inputs)
predicted_class = outputs.logits.argmax(-1).item()

labels = {
    0: "Policy Schedule", 1: "Certificate of Insurance", 2: "Claim Form",
    3: "Loss Adjuster Report", 4: "Bordereaux — Premium", 5: "Bordereaux — Claims",
    6: "Endorsement", 7: "Renewal Notice", 8: "Statement of Fact",
    9: "FNOL Report", 10: "Subrogation Notice", 11: "Policy Wording"
}
print(f"Document type: {labels[predicted_class]}")

Learn How This Model Works

New to ML or insurance AI? We wrote a plain-English guide that explains the concepts, the data, and the actual training code behind every INSUREOS model:

Learning guide: LEARN.md
Training code: piyushptiwari/insureos-models — this model is trained by training/doc_classifier.py

Part of the INSUREOS Model Suite

This model is part of the INSUREOS — a complete AI/ML suite for insurance operations built by Bytical AI:

Model	Task	Metric
InsureLLM-4B	Insurance domain LLM	ROUGE-1: 0.384
InsureDocClassifier (this model)	12-class document classification	F1: 1.0
InsureNER	13-entity Named Entity Recognition	F1: 1.0
InsureFraudNet	Fraud detection (Motor/Property/Liability)	AUC-ROC: 1.0
InsurePricing	Insurance pricing (GLM + EBM)	MAE: £11,132

Citation

@misc{bytical2026insuredocclassifier,
  title={InsureDocClassifier: Insurance Document Classification with ModernBERT},
  author={Bytical AI},
  year={2026},
  url={https://huggingface.co/piyushptiwari/InsureDocClassifier}
}

About Bytical AI

Bytical builds AI agents that run insurance operations — claims automation, underwriting intelligence, digital sales, and core system modernization for insurers across the UK and Europe. Microsoft AI Partner | NVIDIA | Salesforce.

Downloads last month: 119

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for piyushptiwari/InsureDocClassifier

Base model

answerdotai/ModernBERT-base

Finetuned

(1361)

this model

Dataset used to train piyushptiwari/InsureDocClassifier

Evaluation results

F1 (macro)
self-reported

1.000
Accuracy
self-reported

1.000