InsuranceBot / 40-data /information_source_map.md
rohitsar567's picture
data+scoring: verbatim-source all policy_facts, recalibrate scorecard, fix recommendation
7081aaa
|
Raw
History Blame Contribute Delete
5.96 kB
# Information Source Map β€” Structured-Fact KB
_Last updated: 2026-05-15_
This file aggregates the source metadata embedded in `40-data/policy_facts/`, `40-data/reviews/`, and `40-data/premiums/illustrative_premiums.json` so every populated fact in the KB can be traced to a PDF, URL, or rate card. J3 verification will fill the `verified` / `last_verified` fields downstream.
## 1. Aggregate Stats
- **Total cited facts:** 5,616
- policy_facts: **5,460**
- insurer_reviews: **65**
- premium samples: **91**
- **Distinct source URLs:** 97
- **Distinct PDF paths (policy wordings/brochures):** 188
- **% policy_facts with PDF citation:** 98.8%
- **% policy_facts with explicit URL:** 7.4%
- **% review facts with primary URL:** 100.0%
- **% premium samples with real (non-derived) URL:** 40.7%
### Source-type breakdown
**Policy facts**
| source_type | count |
|---|---:|
| policy_pdf | 5,046 |
| web_url | 405 |
| missing | 9 |
**Reviews**
| source_type | count |
|---|---:|
| irdai_annual_report | 43 |
| irdai_complaints | 19 |
| irdai | 3 |
**Premiums**
| source_type | count |
|---|---:|
| derived_anchor | 54 |
| insurer_or_other_url | 16 |
| policybazaar_tile | 10 |
| official_rate_card | 9 |
| joinditto_chart | 2 |
## 2. Coverage by Category
### 2.1 Policy facts β€” per-insurer policy counts
| insurer_slug | policies indexed | populated fact rows |
|---|---:|---:|
| acko | 9 | 203 |
| aditya-birla | 8 | 137 |
| bajaj-allianz | 17 | 301 |
| care-health | 15 | 249 |
| cholamandalam | 6 | 116 |
| go-digit | 6 | 154 |
| hdfc-ergo | 23 | 420 |
| icici-lombard | 17 | 322 |
| iffco-tokio | 6 | 165 |
| indusind-general | 3 | 48 |
| manipalcigna | 8 | 155 |
| national-insurance | 40 | 1,122 |
| new-india | 14 | 178 |
| niva-bupa | 19 | 375 |
| oriental-insurance | 6 | 141 |
| reliance-general | 1 | 9 |
| royal-sundaram | 14 | 414 |
| sbi-general | 12 | 343 |
| star-health | 18 | 372 |
| tata-aig | 11 | 236 |
### 2.2 Insurer reviews β€” CSR / complaint coverage
| insurer_slug | CSR % | CSR year | complaints/10K | source_irdai_url present? | source_complaints_url present? |
|---|---:|---|---:|---|---|
| acko | 96.31 | FY 2023-24 | 16 | yes | yes |
| aditya-birla | 92.97 | 2023-24 | 13 | yes | no |
| bajaj-allianz | 92.24 | 2023-24 | 3 | yes | no |
| care-health | 93.13 | 2023-24 (3-year avg) | 42 | yes | no |
| cholamandalam | 94.5 | FY 2023-24 | 13 | yes | yes |
| go-digit | 90.69 | FY 2023-24 | 19 | yes | yes |
| hdfc-ergo | 99.1 | 2023-24 | 15 | yes | no |
| icici-lombard | 85.0 | 2023-24 | 10 | yes | no |
| iffco-tokio | 96.33 | FY 2023-24 | 41 | yes | yes |
| indusind-general | 86.38 | FY 2024-25 | β€” | yes | yes |
| manipalcigna | 99.0 | 2023-24 | 24 | yes | no |
| national-insurance | 91.18 | FY 2023-24 | 29 | yes | yes |
| new-india | 95.04 | 2023-24 (by claim count) | 20 | yes | no |
| niva-bupa | 91.62 | 2023-24 (3-year avg through FY25) | 43 | yes | no |
| oriental-insurance | 93.96 | FY 2023-24 | β€” | yes | yes |
| reliance-general | 98.75 | FY 2023-24 | 5 | yes | yes |
| royal-sundaram | 95.95 | FY 2023-24 | 18 | yes | yes |
| sbi-general | 96.14 | FY 2022-25 (3-yr avg) | 15 | yes | yes |
| star-health | 82.31 | 2023-24 | 52 | yes | no |
| tata-aig | 88.72 | 2023-24 (3-year avg) | 11 | yes | no |
### 2.3 Premiums β€” per-policy sample coverage
| policy_id | sample count | real-URL samples | derived samples |
|---|---:|---:|---:|
| aditya-birla-activ-assure-diamond | 5 | 1 | 4 |
| aditya-birla-group-activ-health | 3 | 0 | 3 |
| bajaj-allianz-health-guard | 5 | 1 | 4 |
| bajaj-allianz-silver-health | 3 | 1 | 2 |
| bajaj-allianz-tax-gain | 1 | 1 | 0 |
| care-health-care-advantage | 1 | 1 | 0 |
| care-health-care-classic | 2 | 1 | 1 |
| care-health-care-senior | 3 | 1 | 2 |
| care-health-care-supreme | 5 | 2 | 3 |
| hdfc-ergo-energy | 3 | 0 | 3 |
| hdfc-ergo-optima-plus | 2 | 1 | 1 |
| hdfc-ergo-optima-restore | 3 | 0 | 3 |
| hdfc-ergo-optima-secure | 5 | 2 | 3 |
| icici-lombard-elevate | 4 | 1 | 3 |
| icici-lombard-health-advantedge | 2 | 1 | 1 |
| manipalcigna-prohealth-prime-active | 3 | 1 | 2 |
| new-india-asha-kiran | 2 | 1 | 1 |
| new-india-mediclaim | 3 | 1 | 2 |
| niva-bupa-aspire | 3 | 1 | 2 |
| niva-bupa-health-premia | 3 | 1 | 2 |
| niva-bupa-reassure | 4 | 2 | 2 |
| royal-sundaram-advanced-top-up | 3 | 3 | 0 |
| sbi-general-arogya-supreme | 2 | 2 | 0 |
| star-health-comprehensive | 3 | 1 | 2 |
| star-health-family-health-optima | 5 | 3 | 2 |
| star-health-senior-red-carpet | 5 | 1 | 4 |
| tata-aig-medicare | 3 | 2 | 1 |
| tata-aig-medicare-premier | 5 | 4 | 1 |
## 3. Needs-source (unverifiable until repaired)
- policy_facts rows lacking any source path/URL or explicitly marked "not extracted": **9**
- review rows lacking a primary IRDAI URL: **0**
- premium samples derived from anchors (not directly sourced): **54**
- premium samples with no URL at all: **0**
### Top 15 policy_fact fields with missing source metadata
| field | rows missing source |
|---|---:|
| network_hospital_count | 2 |
| maternity_coverage | 2 |
| pre_existing_disease_waiting_months | 2 |
| ayush_coverage | 2 |
| cashless_treatment_supported | 1 |
## 4. How to read the JSON twin
- `policy_facts[]` β€” one row per populated `(policy_id, field)` pair, with `source_path` (PDF), optional `source_url`, `source_quote`, `confidence`, and J3-bound `verified` / `last_verified` slots.
- `insurer_reviews[]` β€” one row per populated `(insurer_slug, metric)` pair, with primary IRDAI URL plus optional secondary / insurer-company URLs and the reporting `year`.
- `premiums[]` β€” one row per `(policy_id, sample_profile)` pair with `annual_premium_inr`, `source_url`, `source_note`, and a `source_type` of `policybazaar_tile` / `joinditto_chart` / `beshak_review` / `official_rate_card` / `derived_anchor`.
- `needs_source.*` β€” flat lists of rows where source metadata is absent or explicitly marked missing, so we know exactly what J3 verification cannot yet check.