Natwar commited on
Commit
132935b
·
verified ·
1 Parent(s): b224560

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +146 -14
README.md CHANGED
@@ -1,14 +1,146 @@
1
- ---
2
- title: Pibit.ai Insurance Tokenizer
3
- emoji: 📉
4
- colorFrom: indigo
5
- colorTo: pink
6
- sdk: gradio
7
- sdk_version: 5.44.1
8
- app_file: app.py
9
- pinned: false
10
- license: mit
11
- short_description: BPE tokenizer for the Property & Casualty insurance industry
12
- ---
13
-
14
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Pibit.ai Insurance Tokenizer: Live Demo & Examples
2
+ Welcome to the interactive demo for the Pibit.ai Insurance Tokenizer, a tool designed to showcase the power of a domain-specific NLP model for the Property & Casualty insurance industry.
3
+
4
+ Unlike generic models, this tokenizer understands the unique language of insurance—from loss runs to policy submissions. This demo allows you to test its capabilities on realistic, complex documents and see the detailed analysis it produces in real-time.
5
+
6
+ How to Use the Demo 🚀
7
+ It's simple to get started. Just follow these steps:
8
+
9
+ Make sure you are on the "📊 Document Analysis" tab in the application.
10
+
11
+ Choose one of the sample documents provided below.
12
+
13
+ Click the copy button (📋) in the top-right corner of the document's text box.
14
+
15
+ Paste the text into the main input area labeled "📄 Insurance Document Text".
16
+
17
+ Click the "🔍 Analyze Document" button and see the results generate instantly.
18
+
19
+ 📋 Sample Documents for Analysis
20
+ These examples have been crafted to test different features of the tokenizer, from entity recognition to risk assessment.
21
+
22
+ Example 1: Detailed Loss Run Report (General Liability)
23
+ This is a classic insurance document containing a mix of structured data, dates, and financial figures. It's a perfect test for core entity extraction and risk analysis.
24
+
25
+ What to look for:
26
+
27
+ Document Classification: The model should identify this as a Loss Run.
28
+
29
+ Entity Recognition: Watch how it correctly extracts multiple <POLICY>, <DATE>, and <AMOUNT> tokens.
30
+
31
+ Risk Score: The score will be elevated due to multiple open claims and high reserve amounts ($75,000 and $25,000).
32
+
33
+ Plaintext
34
+
35
+ CONFIDENTIAL LOSS RUN REPORT
36
+ Insured: Precision Engineering & Fabrication LLC
37
+ Policy Period: 01/01/2024 - 01/01/2025
38
+ Policy Number: GL-98765B43
39
+ Line of Business: General Liability
40
+
41
+ As of Report Date: 09/05/2025
42
+
43
+ ----------------------------------------------------------------------
44
+ Claim #: 2024-00182 Date of Loss: 02/15/2024 Status: Closed
45
+ Claimant: John Doe
46
+ Description: Slip and fall on wet floor near entrance. Claimant sustained a fractured wrist.
47
+ Total Paid: $18,550.00
48
+ Expense Paid: $3,200.00
49
+ Reserve: $0.00
50
+ Total Incurred: $21,750.00
51
+ ----------------------------------------------------------------------
52
+ Claim #: 2024-00541 Date of Loss: 05/22/2024 Status: Open
53
+ Claimant: Acme Retail Co.
54
+ Description: Alleged product defect. A manufactured valve failed, causing water damage to claimant's inventory. Investigation ongoing.
55
+ Total Paid: $0.00
56
+ Expense Paid: $5,500.00
57
+ Reserve: $75,000.00
58
+ Total Incurred: $80,500.00
59
+ ----------------------------------------------------------------------
60
+ Claim #: 2025-00012 Date of Loss: 08/19/2025 Status: Open - Reported Late
61
+ Claimant: Jane Smith
62
+ Description: Laceration from a sharp metal edge on a custom-fabricated part. Potential for litigation.
63
+ Total Paid: $1,200.00 (Medical Payments)
64
+ Expense Paid: $750.00
65
+ Reserve: $25,000.00
66
+ Total Incurred: $26,950.00
67
+ ----------------------------------------------------------------------
68
+
69
+ Summary Totals:
70
+ Total Paid Losses: $19,750.00
71
+ Total Outstanding Reserves: $100,000.00
72
+ Total Incurred: $129,200.00
73
+ Example 2: Commercial Auto Submission 🚗
74
+ This example demonstrates how the tokenizer handles a different line of business and parses information from an application, including specific coverage terms.
75
+
76
+ What to look for:
77
+
78
+ Document Classification: Should be correctly identified as a Submission.
79
+
80
+ Key Terms: It will pick up domain-specific terms like commercial auto liability, deductible, and medical payments.
81
+
82
+ Risk Score: The risk score should be relatively low due to a clean driving history and no hazardous material transport.
83
+
84
+ Plaintext
85
+
86
+ COMMERCIAL AUTO INSURANCE APPLICATION
87
+ Applicant: Swift Logistics Inc.
88
+ Address: 123 Freight Lane, Delhi, 110045
89
+ Policy Effective Date Requested: 10/01/2025
90
+
91
+ Business Operations: Regional transportation and delivery of dry goods. Radius of operations is 500km. No hazardous materials are transported.
92
+
93
+ Driver Information:
94
+ - All drivers have a minimum of 3 years commercial driving experience and clean MVRs.
95
+
96
+ Vehicle Schedule:
97
+ 1. 2022 Tata Ultra T.7 - VIN: MA123456789XYZ001 - Cost New: ₹15,00,000
98
+ 2. 2023 Ashok Leyland Bada Dost - VIN: MB987654321ABC002 - Cost New: ₹9,50,000
99
+ 3. 2021 Eicher Pro 2049 - VIN: MC555444333DEF003 - Cost New: ₹11,00,000
100
+
101
+ Requested Coverages:
102
+ - Commercial Auto Liability: $1,000,000 Combined Single Limit
103
+ - Physical Damage (Collision): $2,500 Deductible
104
+ - Physical Damage (Comprehensive): $1,000 Deductible
105
+ - Medical Payments: $5,000
106
+
107
+ Loss History:
108
+ - One minor backing accident in the last 3 years. Total Payout: $1,800 for property damage. No injuries reported.
109
+ Example 3: Property Claim - First Notice of Loss (FNOL) ⛈️
110
+ This document is highly unstructured and narrative-based. It tests the model's ability to extract meaning and assess risk from a descriptive text.
111
+
112
+ What to look for:
113
+
114
+ Document Classification: The model should classify this as a Claim report.
115
+
116
+ Key Terms: It will identify risk-related words like damage, storm, water damage, and hole.
117
+
118
+ Entity Recognition: The policy number, date, and estimated damage amount ($50,000) should be captured.
119
+
120
+ Plaintext
121
+
122
+ FIRST NOTICE OF LOSS - COMMERCIAL PROPERTY
123
+ Policy Number: CP-A45-33-821
124
+ Insured Name: "The Grand Heritage" Hotel
125
+ Date of Loss: Approximately 09/04/2025 during the evening monsoon.
126
+
127
+ Description of Loss:
128
+ Severe monsoon storm with high winds and torrential rain caused significant damage. A large tree on the property was uprooted and fell onto the roof of our west wing, creating a large hole. Water has entered several guest rooms (Rooms 201, 203, 205) and the main banquet hall, causing extensive water damage to ceilings, walls, carpeting, and furniture. The electrical system in that wing has been shut down as a precaution.
129
+
130
+ Estimated Damage:
131
+ Initial estimate from our contractor, "Delhi Restoration Services," is upwards of $50,000 but a full assessment is pending. We are also expecting a significant business interruption loss as the wing is unusable.
132
+
133
+ Any Injuries: No injuries to staff or guests have been reported.
134
+
135
+ Contact Person: Mr. Arjun Singh (General Manager)
136
+ Contact Number: +91-98XXXX-XXXX
137
+ 📊 Interpreting the Results
138
+ After analyzing a document, you'll see several output components:
139
+
140
+ Analysis Report: A summary that classifies the document, provides a risk score, and lists key token metrics.
141
+
142
+ Risk Gauge & Token Pie Chart: Visualizations of the risk level and the ratio of insurance-specific terms to general terms.
143
+
144
+ Detected Entities Table: A structured list of every policy number, financial amount, date, and percentage found in the text.
145
+
146
+ Tokenization Sample: A preview showing exactly how the model breaks down the raw text into meaningful tokens.