Bader Alabddan
Add master prompt compliance: models/, data/, docs/, fraud_engine.py
9d20d0b

A newer version of the Gradio SDK is available: 6.3.0

Upgrade

Fraud Simulator Dataset

Overview

This dataset contains synthetic insurance claims for fraud detection training and validation.

Dataset Structure

Files

  • claims_normal.csv - Legitimate insurance claims
  • claims_fraudulent.csv - Fraudulent insurance claims
  • claims_combined.csv - Combined dataset with labels
  • metadata.json - Dataset metadata and statistics

Schema

Claim Record:

{
  "claim_id": "string",
  "amount": "float",
  "type": "string (auto|property|health|life)",
  "claimant_id": "string",
  "days_since_policy_start": "integer",
  "claimant_history": {
    "claim_count": "integer",
    "avg_amount": "float",
    "total_paid": "float"
  },
  "document_consistency_score": "float (0.0-1.0)",
  "linked_suspicious_entities": "integer",
  "label": "string (fraud|legitimate)"
}

Fraud Patterns Included

  1. Staged Accidents: Multiple claims with similar patterns
  2. Document Mismatch: Inconsistent documentation
  3. Early Claims: Claims filed shortly after policy inception
  4. Amount Inflation: Claims significantly above average
  5. Entity Networks: Connected suspicious entities
  6. High Frequency: Repeated claims from same claimant

Dataset Statistics

  • Total Claims: 10,000
  • Fraudulent: 2,500 (25%)
  • Legitimate: 7,500 (75%)
  • Claim Types: Auto (40%), Property (30%), Health (20%), Life (10%)
  • Average Claim Amount: $5,000
  • Date Range: 2020-2026

Usage

This dataset is used for:

  • Model training and validation
  • Fraud pattern simulation
  • Stress testing
  • Drift scenario testing
  • Performance benchmarking

Data Quality

  • No missing values
  • Balanced across claim types
  • Realistic fraud patterns based on industry data
  • Regular updates with new fraud patterns

Privacy

All data is synthetic and does not contain real PII.

License

For internal use only. Part of BDR-Agent-Factory ecosystem.