piyushptiwari commited on
Commit
ae3043e
·
verified ·
1 Parent(s): 2de6c0f

Upload folder using huggingface_hub

Browse files
README.md ADDED
@@ -0,0 +1,125 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ tags:
6
+ - insurance
7
+ - ner
8
+ - named-entity-recognition
9
+ - modernbert
10
+ - uk-insurance
11
+ - token-classification
12
+ - bytical
13
+ library_name: transformers
14
+ pipeline_tag: token-classification
15
+ base_model: answerdotai/ModernBERT-base
16
+ datasets:
17
+ - piyushptiwari/insureos-training-data
18
+ model-index:
19
+ - name: InsureNER
20
+ results:
21
+ - task:
22
+ type: token-classification
23
+ name: Insurance Named Entity Recognition
24
+ metrics:
25
+ - type: f1
26
+ value: 1.0
27
+ name: F1
28
+ - type: precision
29
+ value: 1.0
30
+ name: Precision
31
+ - type: recall
32
+ value: 1.0
33
+ name: Recall
34
+ ---
35
+
36
+ # InsureNER — Insurance Named Entity Recognition
37
+
38
+ **Created by [Bytical AI](https://bytical.ai)** — AI agents that run insurance operations.
39
+
40
+ ## Model Description
41
+
42
+ InsureNER is a domain-specific Named Entity Recognition model for the UK insurance industry. Built on ModernBERT-base, it recognizes 13 insurance-specific entity types using BIO tagging (26 tags + O = 27 total labels).
43
+
44
+ ### Entity Types (13)
45
+
46
+ | Entity | Description | Example |
47
+ |--------|-------------|---------|
48
+ | `CLAIM_NUMBER` | Insurance claim reference | CLM-2024-001234 |
49
+ | `DATE` | Dates in insurance context | 15 March 2026 |
50
+ | `INSURER` | Insurance company name | Aviva, AXA, Zurich |
51
+ | `LOB` | Line of Business | Motor, Property, Liability |
52
+ | `MGA` | Managing General Agent | Covéa, eSure |
53
+ | `MONEY` | Monetary amounts | £45,000, $1.2M |
54
+ | `ORG` | Organisation name | FCA, Lloyd's of London |
55
+ | `PERIL` | Insurance peril/risk | Flood, Fire, Theft |
56
+ | `PERSON` | Person name | John Smith |
57
+ | `POLICY_NUMBER` | Policy reference | POL-UK-2024-56789 |
58
+ | `POSTCODE` | UK postcode | SW1A 1AA, EC2M 7PP |
59
+ | `REGULATION` | Regulatory reference | Consumer Duty, Solvency II |
60
+ | `SYNDICATE` | Lloyd's syndicate | Syndicate 2623 |
61
+ | `VEHICLE` | Vehicle description | 2023 BMW 320d |
62
+
63
+ ### Training Details
64
+
65
+ | Parameter | Value |
66
+ |-----------|-------|
67
+ | Base Model | answerdotai/ModernBERT-base |
68
+ | Training Samples | 8,000 synthetic NER-annotated insurance texts |
69
+ | Epochs | 8 |
70
+ | Label Schema | BIO (27 labels) |
71
+ | GPU | NVIDIA Tesla T4 16GB |
72
+
73
+ ### Evaluation Results
74
+
75
+ | Metric | Score |
76
+ |--------|-------|
77
+ | **F1** | **1.0** |
78
+ | **Precision** | **1.0** |
79
+ | **Recall** | **1.0** |
80
+ | Eval Loss | 4.80e-05 |
81
+ | Eval Samples/sec | 68.72 |
82
+
83
+ ## How to Use
84
+
85
+ ```python
86
+ from transformers import AutoModelForTokenClassification, AutoTokenizer, pipeline
87
+
88
+ model = AutoModelForTokenClassification.from_pretrained("piyushptiwari/InsureNER")
89
+ tokenizer = AutoTokenizer.from_pretrained("piyushptiwari/InsureNER")
90
+
91
+ ner_pipeline = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy="simple")
92
+
93
+ text = "Aviva policy POL-UK-2024-56789 covers John Smith at SW1A 1AA for motor insurance. Claim CLM-2024-001234 was filed on 15 March 2026 for £45,000."
94
+ entities = ner_pipeline(text)
95
+
96
+ for ent in entities:
97
+ print(f" {ent['entity_group']:20s} {ent['word']:30s} (score: {ent['score']:.3f})")
98
+ ```
99
+
100
+ ## Part of the INSUREOS Model Suite
101
+
102
+ This model is part of the **INSUREOS** — a complete AI/ML suite for insurance operations built by Bytical AI:
103
+
104
+ | Model | Task | Metric |
105
+ |-------|------|--------|
106
+ | [InsureLLM-4B](https://huggingface.co/piyushptiwari/InsureLLM-4B) | Insurance domain LLM | ROUGE-1: 0.384 |
107
+ | [InsureDocClassifier](https://huggingface.co/piyushptiwari/InsureDocClassifier) | 12-class document classification | F1: 1.0 |
108
+ | **InsureNER** (this model) | 13-entity Named Entity Recognition | F1: 1.0 |
109
+ | [InsureFraudNet](https://huggingface.co/piyushptiwari/InsureFraudNet) | Fraud detection (Motor/Property/Liability) | AUC-ROC: 1.0 |
110
+ | [InsurePricing](https://huggingface.co/piyushptiwari/InsurePricing) | Insurance pricing (GLM + EBM) | MAE: £11,132 |
111
+
112
+ ## Citation
113
+
114
+ ```bibtex
115
+ @misc{bytical2026insurener,
116
+ title={InsureNER: Insurance Named Entity Recognition with ModernBERT},
117
+ author={Bytical AI},
118
+ year={2026},
119
+ url={https://huggingface.co/piyushptiwari/InsureNER}
120
+ }
121
+ ```
122
+
123
+ ## About Bytical AI
124
+
125
+ [Bytical](https://bytical.ai) builds AI agents that run insurance operations — claims automation, underwriting intelligence, digital sales, and core system modernization for insurers across the UK and Europe. Microsoft AI Partner | NVIDIA | Salesforce.
config.json ADDED
@@ -0,0 +1,134 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "ModernBertForTokenClassification"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 50281,
8
+ "classifier_activation": "gelu",
9
+ "classifier_bias": false,
10
+ "classifier_dropout": 0.0,
11
+ "classifier_pooling": "mean",
12
+ "cls_token_id": 50281,
13
+ "decoder_bias": true,
14
+ "deterministic_flash_attn": false,
15
+ "dtype": "float32",
16
+ "embedding_dropout": 0.0,
17
+ "eos_token_id": 50282,
18
+ "global_attn_every_n_layers": 3,
19
+ "gradient_checkpointing": false,
20
+ "hidden_activation": "gelu",
21
+ "hidden_size": 768,
22
+ "id2label": {
23
+ "0": "O",
24
+ "1": "B-CLAIM_NUMBER",
25
+ "2": "B-DATE",
26
+ "3": "B-INSURER",
27
+ "4": "B-LOB",
28
+ "5": "B-MGA",
29
+ "6": "B-MONEY",
30
+ "7": "B-ORG",
31
+ "8": "B-PERIL",
32
+ "9": "B-PERSON",
33
+ "10": "B-POLICY_NUMBER",
34
+ "11": "B-POSTCODE",
35
+ "12": "B-REGULATION",
36
+ "13": "B-SYNDICATE",
37
+ "14": "B-VEHICLE",
38
+ "15": "I-DATE",
39
+ "16": "I-INSURER",
40
+ "17": "I-LOB",
41
+ "18": "I-MGA",
42
+ "19": "I-ORG",
43
+ "20": "I-PERIL",
44
+ "21": "I-PERSON",
45
+ "22": "I-POSTCODE",
46
+ "23": "I-REGULATION",
47
+ "24": "I-SYNDICATE",
48
+ "25": "I-VEHICLE"
49
+ },
50
+ "initializer_cutoff_factor": 2.0,
51
+ "initializer_range": 0.02,
52
+ "intermediate_size": 1152,
53
+ "label2id": {
54
+ "B-CLAIM_NUMBER": 1,
55
+ "B-DATE": 2,
56
+ "B-INSURER": 3,
57
+ "B-LOB": 4,
58
+ "B-MGA": 5,
59
+ "B-MONEY": 6,
60
+ "B-ORG": 7,
61
+ "B-PERIL": 8,
62
+ "B-PERSON": 9,
63
+ "B-POLICY_NUMBER": 10,
64
+ "B-POSTCODE": 11,
65
+ "B-REGULATION": 12,
66
+ "B-SYNDICATE": 13,
67
+ "B-VEHICLE": 14,
68
+ "I-DATE": 15,
69
+ "I-INSURER": 16,
70
+ "I-LOB": 17,
71
+ "I-MGA": 18,
72
+ "I-ORG": 19,
73
+ "I-PERIL": 20,
74
+ "I-PERSON": 21,
75
+ "I-POSTCODE": 22,
76
+ "I-REGULATION": 23,
77
+ "I-SYNDICATE": 24,
78
+ "I-VEHICLE": 25,
79
+ "O": 0
80
+ },
81
+ "layer_norm_eps": 1e-05,
82
+ "layer_types": [
83
+ "full_attention",
84
+ "sliding_attention",
85
+ "sliding_attention",
86
+ "full_attention",
87
+ "sliding_attention",
88
+ "sliding_attention",
89
+ "full_attention",
90
+ "sliding_attention",
91
+ "sliding_attention",
92
+ "full_attention",
93
+ "sliding_attention",
94
+ "sliding_attention",
95
+ "full_attention",
96
+ "sliding_attention",
97
+ "sliding_attention",
98
+ "full_attention",
99
+ "sliding_attention",
100
+ "sliding_attention",
101
+ "full_attention",
102
+ "sliding_attention",
103
+ "sliding_attention",
104
+ "full_attention"
105
+ ],
106
+ "local_attention": 128,
107
+ "max_position_embeddings": 8192,
108
+ "mlp_bias": false,
109
+ "mlp_dropout": 0.0,
110
+ "model_type": "modernbert",
111
+ "norm_bias": false,
112
+ "norm_eps": 1e-05,
113
+ "num_attention_heads": 12,
114
+ "num_hidden_layers": 22,
115
+ "pad_token_id": 50283,
116
+ "position_embedding_type": "absolute",
117
+ "rope_parameters": {
118
+ "full_attention": {
119
+ "rope_theta": 160000.0,
120
+ "rope_type": "default"
121
+ },
122
+ "sliding_attention": {
123
+ "rope_theta": 10000.0,
124
+ "rope_type": "default"
125
+ }
126
+ },
127
+ "sep_token_id": 50282,
128
+ "sparse_pred_ignore_index": -100,
129
+ "sparse_prediction": false,
130
+ "tie_word_embeddings": true,
131
+ "transformers_version": "5.4.0",
132
+ "use_cache": false,
133
+ "vocab_size": 50368
134
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6f0f1eb0e35c1b5e3d3d83d714c31d3457aa96de18e4f8a6bf0c1ad936e78969
3
+ size 598513616
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "backend": "tokenizers",
3
+ "clean_up_tokenization_spaces": true,
4
+ "cls_token": "[CLS]",
5
+ "is_local": false,
6
+ "mask_token": "[MASK]",
7
+ "model_input_names": [
8
+ "input_ids",
9
+ "attention_mask"
10
+ ],
11
+ "model_max_length": 8192,
12
+ "pad_token": "[PAD]",
13
+ "sep_token": "[SEP]",
14
+ "tokenizer_class": "TokenizersBackend",
15
+ "unk_token": "[UNK]"
16
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8d340c6e25809bba480c32e65688ab7238c961effa4d63785924f06d8f2d539f
3
+ size 5201
training_meta.json ADDED
@@ -0,0 +1,96 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "label_list": [
3
+ "O",
4
+ "B-CLAIM_NUMBER",
5
+ "B-DATE",
6
+ "B-INSURER",
7
+ "B-LOB",
8
+ "B-MGA",
9
+ "B-MONEY",
10
+ "B-ORG",
11
+ "B-PERIL",
12
+ "B-PERSON",
13
+ "B-POLICY_NUMBER",
14
+ "B-POSTCODE",
15
+ "B-REGULATION",
16
+ "B-SYNDICATE",
17
+ "B-VEHICLE",
18
+ "I-DATE",
19
+ "I-INSURER",
20
+ "I-LOB",
21
+ "I-MGA",
22
+ "I-ORG",
23
+ "I-PERIL",
24
+ "I-PERSON",
25
+ "I-POSTCODE",
26
+ "I-REGULATION",
27
+ "I-SYNDICATE",
28
+ "I-VEHICLE"
29
+ ],
30
+ "label2id": {
31
+ "O": 0,
32
+ "B-CLAIM_NUMBER": 1,
33
+ "B-DATE": 2,
34
+ "B-INSURER": 3,
35
+ "B-LOB": 4,
36
+ "B-MGA": 5,
37
+ "B-MONEY": 6,
38
+ "B-ORG": 7,
39
+ "B-PERIL": 8,
40
+ "B-PERSON": 9,
41
+ "B-POLICY_NUMBER": 10,
42
+ "B-POSTCODE": 11,
43
+ "B-REGULATION": 12,
44
+ "B-SYNDICATE": 13,
45
+ "B-VEHICLE": 14,
46
+ "I-DATE": 15,
47
+ "I-INSURER": 16,
48
+ "I-LOB": 17,
49
+ "I-MGA": 18,
50
+ "I-ORG": 19,
51
+ "I-PERIL": 20,
52
+ "I-PERSON": 21,
53
+ "I-POSTCODE": 22,
54
+ "I-REGULATION": 23,
55
+ "I-SYNDICATE": 24,
56
+ "I-VEHICLE": 25
57
+ },
58
+ "id2label": {
59
+ "0": "O",
60
+ "1": "B-CLAIM_NUMBER",
61
+ "2": "B-DATE",
62
+ "3": "B-INSURER",
63
+ "4": "B-LOB",
64
+ "5": "B-MGA",
65
+ "6": "B-MONEY",
66
+ "7": "B-ORG",
67
+ "8": "B-PERIL",
68
+ "9": "B-PERSON",
69
+ "10": "B-POLICY_NUMBER",
70
+ "11": "B-POSTCODE",
71
+ "12": "B-REGULATION",
72
+ "13": "B-SYNDICATE",
73
+ "14": "B-VEHICLE",
74
+ "15": "I-DATE",
75
+ "16": "I-INSURER",
76
+ "17": "I-LOB",
77
+ "18": "I-MGA",
78
+ "19": "I-ORG",
79
+ "20": "I-PERIL",
80
+ "21": "I-PERSON",
81
+ "22": "I-POSTCODE",
82
+ "23": "I-REGULATION",
83
+ "24": "I-SYNDICATE",
84
+ "25": "I-VEHICLE"
85
+ },
86
+ "results": {
87
+ "eval_loss": 4.797985820914619e-05,
88
+ "eval_f1": 1.0,
89
+ "eval_precision": 1.0,
90
+ "eval_recall": 1.0,
91
+ "eval_runtime": 11.6416,
92
+ "eval_samples_per_second": 68.719,
93
+ "eval_steps_per_second": 2.147,
94
+ "epoch": 8.0
95
+ }
96
+ }