piyushptiwari commited on
Commit
06a0c29
·
verified ·
1 Parent(s): c57e657

Upload folder using huggingface_hub

Browse files
README.md ADDED
@@ -0,0 +1,123 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ tags:
6
+ - insurance
7
+ - document-classification
8
+ - modernbert
9
+ - uk-insurance
10
+ - text-classification
11
+ - bytical
12
+ library_name: transformers
13
+ pipeline_tag: text-classification
14
+ base_model: answerdotai/ModernBERT-base
15
+ datasets:
16
+ - piyushptiwari/insureos-training-data
17
+ model-index:
18
+ - name: InsureDocClassifier
19
+ results:
20
+ - task:
21
+ type: text-classification
22
+ name: Insurance Document Classification
23
+ metrics:
24
+ - type: f1
25
+ value: 1.0
26
+ name: F1 (macro)
27
+ - type: accuracy
28
+ value: 1.0
29
+ name: Accuracy
30
+ ---
31
+
32
+ # InsureDocClassifier — Insurance Document Classification
33
+
34
+ **Created by [Bytical AI](https://bytical.ai)** — AI agents that run insurance operations.
35
+
36
+ ## Model Description
37
+
38
+ InsureDocClassifier is a 12-class insurance document classifier built on ModernBERT-base. It automatically categorizes insurance documents into their correct type, enabling automated document routing, indexing, and processing in insurance operations.
39
+
40
+ ### Document Classes (12)
41
+
42
+ | ID | Document Type | Description |
43
+ |----|--------------|-------------|
44
+ | 0 | Policy Schedule | Policy details and coverage summary |
45
+ | 1 | Certificate of Insurance | Proof of insurance document |
46
+ | 2 | Claim Form | Insurance claim submission form |
47
+ | 3 | Loss Adjuster Report | Assessment report from loss adjuster |
48
+ | 4 | Bordereaux — Premium | Premium transaction records |
49
+ | 5 | Bordereaux — Claims | Claims transaction records |
50
+ | 6 | Endorsement | Policy amendment document |
51
+ | 7 | Renewal Notice | Policy renewal notification |
52
+ | 8 | Statement of Fact | Declaration of material facts |
53
+ | 9 | FNOL Report | First Notification of Loss report |
54
+ | 10 | Subrogation Notice | Recovery rights notification |
55
+ | 11 | Policy Wording | Full policy terms and conditions |
56
+
57
+ ### Training Details
58
+
59
+ | Parameter | Value |
60
+ |-----------|-------|
61
+ | Base Model | answerdotai/ModernBERT-base |
62
+ | Training Samples | 10,000 synthetic insurance documents |
63
+ | Epochs | 5 |
64
+ | Eval Loss | 4.17e-06 |
65
+ | GPU | NVIDIA Tesla T4 16GB |
66
+
67
+ ### Evaluation Results
68
+
69
+ | Metric | Score |
70
+ |--------|-------|
71
+ | **Accuracy** | **1.0** |
72
+ | **F1 (macro)** | **1.0** |
73
+ | **F1 (weighted)** | **1.0** |
74
+ | Eval Samples/sec | 32.96 |
75
+
76
+ ## How to Use
77
+
78
+ ```python
79
+ from transformers import AutoModelForSequenceClassification, AutoTokenizer
80
+
81
+ model = AutoModelForSequenceClassification.from_pretrained("piyushptiwari/InsureDocClassifier")
82
+ tokenizer = AutoTokenizer.from_pretrained("piyushptiwari/InsureDocClassifier")
83
+
84
+ text = "We hereby confirm that the above-named insured holds a valid policy of insurance..."
85
+ inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
86
+ outputs = model(**inputs)
87
+ predicted_class = outputs.logits.argmax(-1).item()
88
+
89
+ labels = {
90
+ 0: "Policy Schedule", 1: "Certificate of Insurance", 2: "Claim Form",
91
+ 3: "Loss Adjuster Report", 4: "Bordereaux — Premium", 5: "Bordereaux — Claims",
92
+ 6: "Endorsement", 7: "Renewal Notice", 8: "Statement of Fact",
93
+ 9: "FNOL Report", 10: "Subrogation Notice", 11: "Policy Wording"
94
+ }
95
+ print(f"Document type: {labels[predicted_class]}")
96
+ ```
97
+
98
+ ## Part of the INSUREOS Model Suite
99
+
100
+ This model is part of the **INSUREOS** — a complete AI/ML suite for insurance operations built by Bytical AI:
101
+
102
+ | Model | Task | Metric |
103
+ |-------|------|--------|
104
+ | [InsureLLM-4B](https://huggingface.co/piyushptiwari/InsureLLM-4B) | Insurance domain LLM | ROUGE-1: 0.384 |
105
+ | **InsureDocClassifier** (this model) | 12-class document classification | F1: 1.0 |
106
+ | [InsureNER](https://huggingface.co/piyushptiwari/InsureNER) | 13-entity Named Entity Recognition | F1: 1.0 |
107
+ | [InsureFraudNet](https://huggingface.co/piyushptiwari/InsureFraudNet) | Fraud detection (Motor/Property/Liability) | AUC-ROC: 1.0 |
108
+ | [InsurePricing](https://huggingface.co/piyushptiwari/InsurePricing) | Insurance pricing (GLM + EBM) | MAE: £11,132 |
109
+
110
+ ## Citation
111
+
112
+ ```bibtex
113
+ @misc{bytical2026insuredocclassifier,
114
+ title={InsureDocClassifier: Insurance Document Classification with ModernBERT},
115
+ author={Bytical AI},
116
+ year={2026},
117
+ url={https://huggingface.co/piyushptiwari/InsureDocClassifier}
118
+ }
119
+ ```
120
+
121
+ ## About Bytical AI
122
+
123
+ [Bytical](https://bytical.ai) builds AI agents that run insurance operations — claims automation, underwriting intelligence, digital sales, and core system modernization for insurers across the UK and Europe. Microsoft AI Partner | NVIDIA | Salesforce.
config.json ADDED
@@ -0,0 +1,107 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "ModernBertForSequenceClassification"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 50281,
8
+ "classifier_activation": "gelu",
9
+ "classifier_bias": false,
10
+ "classifier_dropout": 0.0,
11
+ "classifier_pooling": "mean",
12
+ "cls_token_id": 50281,
13
+ "decoder_bias": true,
14
+ "deterministic_flash_attn": false,
15
+ "dtype": "float32",
16
+ "embedding_dropout": 0.0,
17
+ "eos_token_id": 50282,
18
+ "global_attn_every_n_layers": 3,
19
+ "gradient_checkpointing": false,
20
+ "hidden_activation": "gelu",
21
+ "hidden_size": 768,
22
+ "id2label": {
23
+ "0": "Policy Schedule",
24
+ "1": "Certificate of Insurance",
25
+ "2": "Claim Form",
26
+ "3": "Loss Adjuster Report",
27
+ "4": "Bordereaux \u2014 Premium",
28
+ "5": "Bordereaux \u2014 Claims",
29
+ "6": "Endorsement",
30
+ "7": "Renewal Notice",
31
+ "8": "Statement of Fact",
32
+ "9": "FNOL Report",
33
+ "10": "Subrogation Notice",
34
+ "11": "Policy Wording"
35
+ },
36
+ "initializer_cutoff_factor": 2.0,
37
+ "initializer_range": 0.02,
38
+ "intermediate_size": 1152,
39
+ "label2id": {
40
+ "Bordereaux \u2014 Claims": 5,
41
+ "Bordereaux \u2014 Premium": 4,
42
+ "Certificate of Insurance": 1,
43
+ "Claim Form": 2,
44
+ "Endorsement": 6,
45
+ "FNOL Report": 9,
46
+ "Loss Adjuster Report": 3,
47
+ "Policy Schedule": 0,
48
+ "Policy Wording": 11,
49
+ "Renewal Notice": 7,
50
+ "Statement of Fact": 8,
51
+ "Subrogation Notice": 10
52
+ },
53
+ "layer_norm_eps": 1e-05,
54
+ "layer_types": [
55
+ "full_attention",
56
+ "sliding_attention",
57
+ "sliding_attention",
58
+ "full_attention",
59
+ "sliding_attention",
60
+ "sliding_attention",
61
+ "full_attention",
62
+ "sliding_attention",
63
+ "sliding_attention",
64
+ "full_attention",
65
+ "sliding_attention",
66
+ "sliding_attention",
67
+ "full_attention",
68
+ "sliding_attention",
69
+ "sliding_attention",
70
+ "full_attention",
71
+ "sliding_attention",
72
+ "sliding_attention",
73
+ "full_attention",
74
+ "sliding_attention",
75
+ "sliding_attention",
76
+ "full_attention"
77
+ ],
78
+ "local_attention": 128,
79
+ "max_position_embeddings": 8192,
80
+ "mlp_bias": false,
81
+ "mlp_dropout": 0.0,
82
+ "model_type": "modernbert",
83
+ "norm_bias": false,
84
+ "norm_eps": 1e-05,
85
+ "num_attention_heads": 12,
86
+ "num_hidden_layers": 22,
87
+ "pad_token_id": 50283,
88
+ "position_embedding_type": "absolute",
89
+ "problem_type": "single_label_classification",
90
+ "rope_parameters": {
91
+ "full_attention": {
92
+ "rope_theta": 160000.0,
93
+ "rope_type": "default"
94
+ },
95
+ "sliding_attention": {
96
+ "rope_theta": 10000.0,
97
+ "rope_type": "default"
98
+ }
99
+ },
100
+ "sep_token_id": 50282,
101
+ "sparse_pred_ignore_index": -100,
102
+ "sparse_prediction": false,
103
+ "tie_word_embeddings": true,
104
+ "transformers_version": "5.4.0",
105
+ "use_cache": false,
106
+ "vocab_size": 50368
107
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d5e4e8133d620a1a8416b330df1907289b322f556822753b31173a47e34006f6
3
+ size 598470552
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "backend": "tokenizers",
3
+ "clean_up_tokenization_spaces": true,
4
+ "cls_token": "[CLS]",
5
+ "is_local": false,
6
+ "mask_token": "[MASK]",
7
+ "model_input_names": [
8
+ "input_ids",
9
+ "attention_mask"
10
+ ],
11
+ "model_max_length": 8192,
12
+ "pad_token": "[PAD]",
13
+ "sep_token": "[SEP]",
14
+ "tokenizer_class": "TokenizersBackend",
15
+ "unk_token": "[UNK]"
16
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:423060dee252df138963ecb244faa459785db6625463e3cfd003ee85e874b7bc
3
+ size 5201
training_meta.json ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "labels": [
3
+ "Policy Schedule",
4
+ "Certificate of Insurance",
5
+ "Claim Form",
6
+ "Loss Adjuster Report",
7
+ "Bordereaux \u2014 Premium",
8
+ "Bordereaux \u2014 Claims",
9
+ "Endorsement",
10
+ "Renewal Notice",
11
+ "Statement of Fact",
12
+ "FNOL Report",
13
+ "Subrogation Notice",
14
+ "Policy Wording"
15
+ ],
16
+ "id2label": {
17
+ "0": "Policy Schedule",
18
+ "1": "Certificate of Insurance",
19
+ "2": "Claim Form",
20
+ "3": "Loss Adjuster Report",
21
+ "4": "Bordereaux \u2014 Premium",
22
+ "5": "Bordereaux \u2014 Claims",
23
+ "6": "Endorsement",
24
+ "7": "Renewal Notice",
25
+ "8": "Statement of Fact",
26
+ "9": "FNOL Report",
27
+ "10": "Subrogation Notice",
28
+ "11": "Policy Wording"
29
+ },
30
+ "results": {
31
+ "eval_loss": 4.1706562114995904e-06,
32
+ "eval_accuracy": 1.0,
33
+ "eval_f1_macro": 1.0,
34
+ "eval_f1_weighted": 1.0,
35
+ "eval_runtime": 30.3435,
36
+ "eval_samples_per_second": 32.956,
37
+ "eval_steps_per_second": 2.076,
38
+ "epoch": 5.0
39
+ }
40
+ }