| # Fraud Simulator Dataset | |
| ## Overview | |
| This dataset contains synthetic insurance claims for fraud detection training and validation. | |
| ## Dataset Structure | |
| ### Files | |
| - `claims_normal.csv` - Legitimate insurance claims | |
| - `claims_fraudulent.csv` - Fraudulent insurance claims | |
| - `claims_combined.csv` - Combined dataset with labels | |
| - `metadata.json` - Dataset metadata and statistics | |
| ### Schema | |
| **Claim Record:** | |
| ```json | |
| { | |
| "claim_id": "string", | |
| "amount": "float", | |
| "type": "string (auto|property|health|life)", | |
| "claimant_id": "string", | |
| "days_since_policy_start": "integer", | |
| "claimant_history": { | |
| "claim_count": "integer", | |
| "avg_amount": "float", | |
| "total_paid": "float" | |
| }, | |
| "document_consistency_score": "float (0.0-1.0)", | |
| "linked_suspicious_entities": "integer", | |
| "label": "string (fraud|legitimate)" | |
| } | |
| ``` | |
| ## Fraud Patterns Included | |
| 1. **Staged Accidents**: Multiple claims with similar patterns | |
| 2. **Document Mismatch**: Inconsistent documentation | |
| 3. **Early Claims**: Claims filed shortly after policy inception | |
| 4. **Amount Inflation**: Claims significantly above average | |
| 5. **Entity Networks**: Connected suspicious entities | |
| 6. **High Frequency**: Repeated claims from same claimant | |
| ## Dataset Statistics | |
| - **Total Claims**: 10,000 | |
| - **Fraudulent**: 2,500 (25%) | |
| - **Legitimate**: 7,500 (75%) | |
| - **Claim Types**: Auto (40%), Property (30%), Health (20%), Life (10%) | |
| - **Average Claim Amount**: $5,000 | |
| - **Date Range**: 2020-2026 | |
| ## Usage | |
| This dataset is used for: | |
| - Model training and validation | |
| - Fraud pattern simulation | |
| - Stress testing | |
| - Drift scenario testing | |
| - Performance benchmarking | |
| ## Data Quality | |
| - No missing values | |
| - Balanced across claim types | |
| - Realistic fraud patterns based on industry data | |
| - Regular updates with new fraud patterns | |
| ## Privacy | |
| All data is synthetic and does not contain real PII. | |
| ## License | |
| For internal use only. Part of BDR-Agent-Factory ecosystem. | |