Upload folder using huggingface_hub

Browse files

Files changed (7) hide show

README.md +123 -0
config.json +107 -0
model.safetensors +3 -0
tokenizer.json +0 -0
tokenizer_config.json +16 -0
training_args.bin +3 -0
training_meta.json +40 -0

README.md ADDED Viewed

	@@ -0,0 +1,123 @@

+---
+language:
+- en
+license: apache-2.0
+tags:
+- insurance
+- document-classification
+- modernbert
+- uk-insurance
+- text-classification
+- bytical
+library_name: transformers
+pipeline_tag: text-classification
+base_model: answerdotai/ModernBERT-base
+datasets:
+- piyushptiwari/insureos-training-data
+model-index:
+- name: InsureDocClassifier
+  results:
+  - task:
+      type: text-classification
+      name: Insurance Document Classification
+    metrics:
+    - type: f1
+      value: 1.0
+      name: F1 (macro)
+    - type: accuracy
+      value: 1.0
+      name: Accuracy
+---
+# InsureDocClassifier — Insurance Document Classification
+**Created by [Bytical AI](https://bytical.ai)** — AI agents that run insurance operations.
+## Model Description
+InsureDocClassifier is a 12-class insurance document classifier built on ModernBERT-base. It automatically categorizes insurance documents into their correct type, enabling automated document routing, indexing, and processing in insurance operations.
+### Document Classes (12)
+| ID | Document Type | Description |
+|----|--------------|-------------|
+| 0 | Policy Schedule | Policy details and coverage summary |
+| 1 | Certificate of Insurance | Proof of insurance document |
+| 2 | Claim Form | Insurance claim submission form |
+| 3 | Loss Adjuster Report | Assessment report from loss adjuster |
+| 4 | Bordereaux — Premium | Premium transaction records |
+| 5 | Bordereaux — Claims | Claims transaction records |
+| 6 | Endorsement | Policy amendment document |
+| 7 | Renewal Notice | Policy renewal notification |
+| 8 | Statement of Fact | Declaration of material facts |
+| 9 | FNOL Report | First Notification of Loss report |
+| 10 | Subrogation Notice | Recovery rights notification |
+| 11 | Policy Wording | Full policy terms and conditions |
+### Training Details
+| Parameter | Value |
+|-----------|-------|
+| Base Model | answerdotai/ModernBERT-base |
+| Training Samples | 10,000 synthetic insurance documents |
+| Epochs | 5 |
+| Eval Loss | 4.17e-06 |
+| GPU | NVIDIA Tesla T4 16GB |
+### Evaluation Results
+| Metric | Score |
+|--------|-------|
+| **Accuracy** | **1.0** |
+| **F1 (macro)** | **1.0** |
+| **F1 (weighted)** | **1.0** |
+| Eval Samples/sec | 32.96 |
+## How to Use
+```python
+from transformers import AutoModelForSequenceClassification, AutoTokenizer
+model = AutoModelForSequenceClassification.from_pretrained("piyushptiwari/InsureDocClassifier")
+tokenizer = AutoTokenizer.from_pretrained("piyushptiwari/InsureDocClassifier")
+text = "We hereby confirm that the above-named insured holds a valid policy of insurance..."
+inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
+outputs = model(**inputs)
+predicted_class = outputs.logits.argmax(-1).item()
+labels = {
+    0: "Policy Schedule", 1: "Certificate of Insurance", 2: "Claim Form",
+    3: "Loss Adjuster Report", 4: "Bordereaux — Premium", 5: "Bordereaux — Claims",
+    6: "Endorsement", 7: "Renewal Notice", 8: "Statement of Fact",
+    9: "FNOL Report", 10: "Subrogation Notice", 11: "Policy Wording"
+}
+print(f"Document type: {labels[predicted_class]}")
+```
+## Part of the INSUREOS Model Suite
+This model is part of the **INSUREOS** — a complete AI/ML suite for insurance operations built by Bytical AI:
+| Model | Task | Metric |
+|-------|------|--------|
+| [InsureLLM-4B](https://huggingface.co/piyushptiwari/InsureLLM-4B) | Insurance domain LLM | ROUGE-1: 0.384 |
+| **InsureDocClassifier** (this model) | 12-class document classification | F1: 1.0 |
+| [InsureNER](https://huggingface.co/piyushptiwari/InsureNER) | 13-entity Named Entity Recognition | F1: 1.0 |
+| [InsureFraudNet](https://huggingface.co/piyushptiwari/InsureFraudNet) | Fraud detection (Motor/Property/Liability) | AUC-ROC: 1.0 |
+| [InsurePricing](https://huggingface.co/piyushptiwari/InsurePricing) | Insurance pricing (GLM + EBM) | MAE: £11,132 |
+## Citation
+```bibtex
+@misc{bytical2026insuredocclassifier,
+  title={InsureDocClassifier: Insurance Document Classification with ModernBERT},
+  author={Bytical AI},
+  year={2026},
+  url={https://huggingface.co/piyushptiwari/InsureDocClassifier}
+}
+```
+## About Bytical AI
+[Bytical](https://bytical.ai) builds AI agents that run insurance operations — claims automation, underwriting intelligence, digital sales, and core system modernization for insurers across the UK and Europe. Microsoft AI Partner | NVIDIA | Salesforce.

config.json ADDED Viewed

	@@ -0,0 +1,107 @@

+{
+  "architectures": [
+    "ModernBertForSequenceClassification"
+  ],
+  "attention_bias": false,
+  "attention_dropout": 0.0,
+  "bos_token_id": 50281,
+  "classifier_activation": "gelu",
+  "classifier_bias": false,
+  "classifier_dropout": 0.0,
+  "classifier_pooling": "mean",
+  "cls_token_id": 50281,
+  "decoder_bias": true,
+  "deterministic_flash_attn": false,
+  "dtype": "float32",
+  "embedding_dropout": 0.0,
+  "eos_token_id": 50282,
+  "global_attn_every_n_layers": 3,
+  "gradient_checkpointing": false,
+  "hidden_activation": "gelu",
+  "hidden_size": 768,
+  "id2label": {
+    "0": "Policy Schedule",
+    "1": "Certificate of Insurance",
+    "2": "Claim Form",
+    "3": "Loss Adjuster Report",
+    "4": "Bordereaux \u2014 Premium",
+    "5": "Bordereaux \u2014 Claims",
+    "6": "Endorsement",
+    "7": "Renewal Notice",
+    "8": "Statement of Fact",
+    "9": "FNOL Report",
+    "10": "Subrogation Notice",
+    "11": "Policy Wording"
+  },
+  "initializer_cutoff_factor": 2.0,
+  "initializer_range": 0.02,
+  "intermediate_size": 1152,
+  "label2id": {
+    "Bordereaux \u2014 Claims": 5,
+    "Bordereaux \u2014 Premium": 4,
+    "Certificate of Insurance": 1,
+    "Claim Form": 2,
+    "Endorsement": 6,
+    "FNOL Report": 9,
+    "Loss Adjuster Report": 3,
+    "Policy Schedule": 0,
+    "Policy Wording": 11,
+    "Renewal Notice": 7,
+    "Statement of Fact": 8,
+    "Subrogation Notice": 10
+  },
+  "layer_norm_eps": 1e-05,
+  "layer_types": [
+    "full_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "full_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "full_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "full_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "full_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "full_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "full_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "full_attention"
+  ],
+  "local_attention": 128,
+  "max_position_embeddings": 8192,
+  "mlp_bias": false,
+  "mlp_dropout": 0.0,
+  "model_type": "modernbert",
+  "norm_bias": false,
+  "norm_eps": 1e-05,
+  "num_attention_heads": 12,
+  "num_hidden_layers": 22,
+  "pad_token_id": 50283,
+  "position_embedding_type": "absolute",
+  "problem_type": "single_label_classification",
+  "rope_parameters": {
+    "full_attention": {
+      "rope_theta": 160000.0,
+      "rope_type": "default"
+    },
+    "sliding_attention": {
+      "rope_theta": 10000.0,
+      "rope_type": "default"
+    }
+  },
+  "sep_token_id": 50282,
+  "sparse_pred_ignore_index": -100,
+  "sparse_prediction": false,
+  "tie_word_embeddings": true,
+  "transformers_version": "5.4.0",
+  "use_cache": false,
+  "vocab_size": 50368
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d5e4e8133d620a1a8416b330df1907289b322f556822753b31173a47e34006f6
+size 598470552

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,16 @@

+{
+  "backend": "tokenizers",
+  "clean_up_tokenization_spaces": true,
+  "cls_token": "[CLS]",
+  "is_local": false,
+  "mask_token": "[MASK]",
+  "model_input_names": [
+    "input_ids",
+    "attention_mask"
+  ],
+  "model_max_length": 8192,
+  "pad_token": "[PAD]",
+  "sep_token": "[SEP]",
+  "tokenizer_class": "TokenizersBackend",
+  "unk_token": "[UNK]"
+}

training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:423060dee252df138963ecb244faa459785db6625463e3cfd003ee85e874b7bc
+size 5201

training_meta.json ADDED Viewed

	@@ -0,0 +1,40 @@

+{
+  "labels": [
+    "Policy Schedule",
+    "Certificate of Insurance",
+    "Claim Form",
+    "Loss Adjuster Report",
+    "Bordereaux \u2014 Premium",
+    "Bordereaux \u2014 Claims",
+    "Endorsement",
+    "Renewal Notice",
+    "Statement of Fact",
+    "FNOL Report",
+    "Subrogation Notice",
+    "Policy Wording"
+  ],
+  "id2label": {
+    "0": "Policy Schedule",
+    "1": "Certificate of Insurance",
+    "2": "Claim Form",
+    "3": "Loss Adjuster Report",
+    "4": "Bordereaux \u2014 Premium",
+    "5": "Bordereaux \u2014 Claims",
+    "6": "Endorsement",
+    "7": "Renewal Notice",
+    "8": "Statement of Fact",
+    "9": "FNOL Report",
+    "10": "Subrogation Notice",
+    "11": "Policy Wording"
+  },
+  "results": {
+    "eval_loss": 4.1706562114995904e-06,
+    "eval_accuracy": 1.0,
+    "eval_f1_macro": 1.0,
+    "eval_f1_weighted": 1.0,
+    "eval_runtime": 30.3435,
+    "eval_samples_per_second": 32.956,
+    "eval_steps_per_second": 2.076,
+    "epoch": 5.0
+  }
+}