π Insurance Document OCR
Extract structured data from insurance documents
Model Description
Specialized OCR model for extracting information from insurance-related documents including claims forms, policy documents, ID cards, and damage photos.
Supported Documents
| Document Type | Fields Extracted |
|---|---|
| Claims Form | Claim #, Date, Amount, Description |
| Policy Document | Policy #, Coverage, Limits, Deductible |
| Driver's License | Name, DOB, License #, Address |
| Vehicle Registration | VIN, Make, Model, Year, Plate |
| Medical Bills | Provider, Date, Charges, Diagnosis |
| Repair Estimates | Shop, Parts, Labor, Total |
| Police Reports | Report #, Date, Officers, Description |
Output Format
{
"document_type": "claims_form",
"confidence": 0.96,
"extracted_fields": {
"claim_number": "CLM-2024-78432",
"incident_date": "2024-01-15",
"claim_amount": 2450.00,
"description": "Rear-end collision at intersection",
"policy_number": "POL-AUTO-12345"
},
"raw_text": "...",
"bounding_boxes": [...]
}
Performance
| Metric | Score |
|---|---|
| Character Accuracy | 98.7% |
| Field Extraction | 95.2% |
| Document Classification | 97.8% |
| Processing Time | 1.2s/page |
Usage
from transformers import pipeline
ocr = pipeline("image-to-text", model="gcc-insurance-ml-models/document-ocr-insurance")
result = ocr("claim_form.jpg")
print(result["extracted_fields"])
Integration
Document Upload
β
[Document OCR] β Structured Data
β
Auto-populate claim form
β
Validate against policy
β
Route to triage
License
Apache 2.0