File size: 17,012 Bytes
8058e7e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
from pathlib import Path
from textwrap import wrap


PAGE_WIDTH = 612
PAGE_HEIGHT = 792
LEFT = 50
TOP = 760
LINE_HEIGHT = 14
MAX_LINES_PER_PAGE = 46
WRAP_WIDTH = 92
PAGE_COUNT = 50


AGENT_TOPICS = [
    (
        "Claim Support Agent Mission",
        "The insurance claim support AI agent helps customers and support adjusters reason through "
        "claim scenarios. The agent should not merely define insurance terms. It should ask what "
        "happened, identify the likely claim type, retrieve relevant policy and procedure evidence, "
        "consider prior user context from memory, and explain whether the claim appears likely covered, "
        "likely not covered, or uncertain. The agent must avoid final binding coverage decisions unless "
        "the policy and claim file clearly support the conclusion. When evidence is incomplete, the "
        "agent should list missing information and recommend human review.",
    ),
    (
        "Scenario Based Claim Reasoning",
        "A scenario-based response begins by restating the facts that matter: cause of loss, date of "
        "loss, property or vehicle involved, policy type, available evidence, mitigation steps, and any "
        "red flags. The agent then maps the scenario to a claim category such as water damage, theft, "
        "fire, auto collision, liability, flood, storm, or personal property. It should compare the "
        "scenario with retrieved claim rules and explain the likely outcome as likely covered, likely "
        "not covered, or needs review. The agent should include citations to retrieved sources and "
        "should clearly separate evidence-based conclusions from assumptions.",
    ),
    (
        "Memory Usage With LangMem",
        "The agent should use memory to personalize support without exposing sensitive information. "
        "Useful memory includes the customer's previous claim type, preferred contact method, recurring "
        "missing documents, prior escalation outcomes, and approved resolution summaries. Memory should "
        "not replace retrieval from policy documents. If memory says the customer previously had a water "
        "claim with missing mitigation invoices, the agent may remind the user that mitigation evidence "
        "was important before, but it must still retrieve current policy guidance before making a coverage "
        "recommendation. Approved human resolutions are stronger memory than unreviewed draft answers.",
    ),
    (
        "Tool Calling Policy",
        "The agent can call tools when the answer depends on external operational data. A claim lookup "
        "tool should be used to check claim status, date of loss, assigned adjuster, missing documents, "
        "and previous notes. A plan lookup tool should be used to check policy limits, endorsements, "
        "deductibles, covered property, and exclusions. An open ticket load tool should be used to decide "
        "whether to route the matter to a human support queue. The agent should state which tool would be "
        "useful and why when a tool result is needed but unavailable.",
    ),
    (
        "Coverage Decision Labels",
        "The agent should use cautious labels. 'Likely covered' means the retrieved evidence supports "
        "coverage and no obvious exclusion appears in the provided scenario. 'Likely not covered' means "
        "the retrieved evidence points to an exclusion or unmet condition. 'Needs human review' means "
        "evidence is missing, policy language is ambiguous, the scenario is high risk, or a tool lookup is "
        "required. These labels are support recommendations, not final legal or contractual decisions.",
    ),
    (
        "Water Damage Scenario Rules",
        "Water damage scenarios require attention to cause and timing. Sudden and accidental discharge "
        "from a burst pipe may be treated more favorably than seepage, repeated leakage, mold, or poor "
        "maintenance. Required evidence often includes notice of loss, photos, plumber report, repair "
        "estimate, mitigation invoice, and proof that the policy was active. If the customer says water "
        "leaked slowly for months, the agent should mark the claim as likely not covered or needs human "
        "review because gradual leakage and maintenance issues may be excluded.",
    ),
    (
        "Flood and Storm Scenario Rules",
        "Flood scenarios should be separated from internal water damage. Heavy rain entering from surface "
        "water, storm surge, overflowing bodies of water, or groundwater may require separate flood coverage. "
        "Wind or hail damage may be handled differently from flood damage. If a customer says the basement "
        "flooded after heavy rain, the agent should not promise coverage under a standard property policy. "
        "It should recommend plan lookup for flood endorsement or separate flood policy and request photos, "
        "weather date, water entry point, and mitigation records.",
    ),
    (
        "Theft Scenario Rules",
        "Theft scenarios require a police report, list of stolen items, proof of ownership, receipts, serial "
        "numbers, and photos or security footage when available. If property was stolen from an unlocked car, "
        "the agent should check whether the property policy or auto policy applies and whether limitations "
        "or exclusions apply. High-value items such as jewelry, electronics, collectibles, firearms, and art "
        "may have sublimits or scheduled property requirements. Missing police report or ownership proof "
        "should trigger human review.",
    ),
    (
        "Fire and Smoke Scenario Rules",
        "Fire and smoke scenarios require fire department report, photos, repair estimate, damaged-property "
        "inventory, proof of ownership for valuable items, and temporary housing receipts if additional living "
        "expense is claimed. Suspected arson, inconsistent timelines, missing fire report, or unusually high "
        "claimed values should trigger escalation. Smoke damage should be described separately from direct "
        "fire damage because cleaning and odor remediation may require different documentation.",
    ),
    (
        "Auto Collision Scenario Rules",
        "Auto collision scenarios require accident date and location, driver details, vehicle photos, repair "
        "estimate, registration, insurance information for involved parties, witness details, and police report "
        "when available. If there is no police report, the claim may still proceed but needs stronger supporting "
        "evidence. Liability depends on statements, traffic rules, point of impact, photos, and police report. "
        "Total loss review requires actual cash value, title status, lienholder details, and state rules.",
    ),
    (
        "Liability Scenario Rules",
        "Liability scenarios involve allegations that the insured caused bodily injury or property damage to "
        "another person. The agent should not admit fault. It should request incident description, claimant "
        "contact details, photos, witness statements, medical bills for bodily injury, property repair invoices, "
        "and any demand letter. Bodily injury, attorney involvement, policy limit demand, or legal threat should "
        "trigger human review and possible specialist routing.",
    ),
    (
        "Human Review Triggers",
        "Human review is required when evidence is missing, documents appear altered, claim facts conflict, "
        "policy language is unclear, the user asks for a denial or appeal decision, legal threats are present, "
        "bodily injury is involved, fraud indicators appear, or high-value property is claimed without proof. "
        "The agent should explain the reason for escalation in plain language and list the next best action.",
    ),
    (
        "Fraud and Risk Signals",
        "Risk signals include loss shortly after policy inception, duplicate receipts, altered invoices, refusal "
        "to permit inspection, repair estimates that do not match photos, multiple similar claims, staged accident "
        "concerns, missing ownership proof, inconsistent timelines, or pressure for immediate payment. Risk signals "
        "do not prove fraud, but they justify additional documentation and senior review.",
    ),
    (
        "Recommended Answer Format",
        "For claim scenarios, the recommended answer format is: decision label, short reasoning, needed evidence, "
        "tool or memory action, and source citation. Example labels are likely covered, likely not covered, and "
        "needs human review. The agent should avoid long legal explanations unless requested. It should be concise, "
        "helpful, and transparent about uncertainty.",
    ),
]


SCENARIOS = [
    (
        "My basement flooded after heavy rain and water came through the floor drain. Will insurance pay?",
        "Needs human review. This may involve flood, surface water, sewer backup, or storm water conditions. "
        "The agent should call plan lookup to check flood or sewer backup endorsement and request photos, "
        "water entry point, weather date, and mitigation records.",
    ),
    (
        "A pipe suddenly burst in my kitchen while I was away for work. I have photos and a plumber report.",
        "Likely covered if the policy covers sudden and accidental water discharge and no exclusion applies. "
        "The agent should request mitigation invoices, repair estimates, date of loss, and policy verification.",
    ),
    (
        "My bathroom leaked slowly for months and now there is mold behind the wall.",
        "Likely not covered or needs human review because gradual leakage, mold, and maintenance issues may be "
        "excluded. The agent should retrieve water damage exclusions and request contractor findings.",
    ),
    (
        "My laptop and camera were stolen from my unlocked car.",
        "Needs human review. The agent should check whether property or auto coverage applies, ask for a police "
        "report, proof of ownership, receipts, serial numbers, and review sublimits for electronics.",
    ),
    (
        "A small kitchen fire damaged cabinets and smoke damaged furniture.",
        "Likely covered if fire is a covered peril and no exclusion applies. Required evidence includes fire "
        "report, photos, repair estimate, smoke remediation estimate, inventory, and receipts.",
    ),
    (
        "I hit another car but there is no police report. Can I still claim?",
        "Needs review but may proceed with other evidence. The agent should request photos, driver information, "
        "repair estimate, witness details, accident location, and statement of events.",
    ),
    (
        "A guest slipped on my stairs and is asking me to pay medical bills.",
        "Needs human review. Bodily injury liability matters should be escalated. The agent should request incident "
        "description, photos, witness statements, medical bills, and any demand letter.",
    ),
    (
        "My roof was damaged by hail during a storm.",
        "Potentially covered depending on policy and evidence. The agent should request photos, contractor estimate, "
        "weather date, inspection notes, and plan lookup for wind or hail coverage and deductible.",
    ),
    (
        "My jewelry was stolen but I do not have receipts.",
        "Needs human review. Jewelry may have sublimits or scheduled property requirements. The agent should request "
        "police report, photos, appraisal, bank records, or other proof of ownership.",
    ),
    (
        "The repair invoice looks higher than the visible damage in photos.",
        "Needs human review because invoice and photo mismatch is a risk signal. The agent should request itemized "
        "estimate, inspection, and senior adjuster review.",
    ),
]


def escape_pdf_text(text: str) -> str:
    return text.replace("\\", "\\\\").replace("(", "\\(").replace(")", "\\)")


def paragraph_lines(text: str) -> list[str]:
    lines: list[str] = []
    for paragraph in text.split("\n"):
        if not paragraph.strip():
            lines.append("")
            continue
        lines.extend(wrap(paragraph, width=WRAP_WIDTH))
    return lines


def make_page(page_number: int) -> list[str]:
    topic = AGENT_TOPICS[(page_number - 1) % len(AGENT_TOPICS)]
    related = AGENT_TOPICS[page_number % len(AGENT_TOPICS)]
    scenario = SCENARIOS[(page_number - 1) % len(SCENARIOS)]
    second_scenario = SCENARIOS[page_number % len(SCENARIOS)]

    body = (
        f"Insurance Claim Support AI Agent with LangMem and RAG - Page {page_number:02d}\n\n"
        f"{topic[0]}\n"
        f"{topic[1]}\n\n"
        f"RAG guidance: Retrieve policy rules, claim procedures, and prior approved resolutions before "
        f"answering. If retrieved evidence is weak, say that evidence is insufficient. Cite retrieved "
        f"sources. Do not invent policy terms, claim status, payment approval, or denial decisions.\n\n"
        f"Memory guidance: Use LangMem-style memory for prior user interactions, repeated missing documents, "
        f"preferred contact method, and approved claim resolutions. Memory may personalize the answer, but "
        f"policy retrieval and tool results should control coverage reasoning.\n\n"
        f"Tool guidance: Use claim lookup for claim status and missing documents. Use plan lookup for coverage, "
        f"limits, deductibles, endorsements, and exclusions. Use ticket load or escalation tools when the case "
        f"requires human review or specialist routing.\n\n"
        f"Scenario example: {scenario[0]}\n"
        f"Expected agent response: {scenario[1]}\n\n"
        f"Additional scenario: {second_scenario[0]}\n"
        f"Expected agent response: {second_scenario[1]}\n\n"
        f"Related topic: {related[0]}. {related[1]}\n\n"
        f"Recommended response structure: Decision label, explanation, missing evidence, recommended tool call, "
        f"human review decision, and source citation."
    )

    lines = paragraph_lines(body)
    if len(lines) > MAX_LINES_PER_PAGE:
        return lines[:MAX_LINES_PER_PAGE]
    return lines + [""] * (MAX_LINES_PER_PAGE - len(lines))


def page_stream(lines: list[str]) -> bytes:
    content_lines = ["BT", "/F1 10 Tf", f"{LEFT} {TOP} Td", f"{LINE_HEIGHT} TL"]
    for index, line in enumerate(lines):
        escaped = escape_pdf_text(line)
        if index == 0:
            content_lines.append(f"({escaped}) Tj")
        else:
            content_lines.append(f"T* ({escaped}) Tj")
    content_lines.append("ET")
    return "\n".join(content_lines).encode("latin-1", errors="replace")


def build_pdf(pages: list[list[str]]) -> bytes:
    objects: list[bytes] = []
    pages_id = 2
    font_id = 3
    page_ids: list[int] = []
    content_ids: list[int] = []

    next_id = 4
    for _ in pages:
        page_ids.append(next_id)
        next_id += 1
        content_ids.append(next_id)
        next_id += 1

    kids = " ".join(f"{page_id} 0 R" for page_id in page_ids)
    objects.append(f"<< /Type /Catalog /Pages {pages_id} 0 R >>".encode("ascii"))
    objects.append(f"<< /Type /Pages /Kids [{kids}] /Count {len(page_ids)} >>".encode("ascii"))
    objects.append(b"<< /Type /Font /Subtype /Type1 /BaseFont /Helvetica >>")

    for page_id, content_id, page_lines in zip(page_ids, content_ids, pages):
        objects.append(
            (
                f"<< /Type /Page /Parent {pages_id} 0 R /MediaBox [0 0 {PAGE_WIDTH} {PAGE_HEIGHT}] "
                f"/Resources << /Font << /F1 {font_id} 0 R >> >> /Contents {content_id} 0 R >>"
            ).encode("ascii")
        )
        content = page_stream(page_lines)
        objects.append(
            b"<< /Length " + str(len(content)).encode("ascii") + b" >>\nstream\n" + content + b"\nendstream"
        )

    pdf = bytearray(b"%PDF-1.4\n")
    offsets = [0]
    for obj_id, body in enumerate(objects, start=1):
        offsets.append(len(pdf))
        pdf.extend(f"{obj_id} 0 obj\n".encode("ascii"))
        pdf.extend(body)
        pdf.extend(b"\nendobj\n")

    xref_start = len(pdf)
    pdf.extend(f"xref\n0 {len(objects) + 1}\n".encode("ascii"))
    pdf.extend(b"0000000000 65535 f \n")
    for offset in offsets[1:]:
        pdf.extend(f"{offset:010d} 00000 n \n".encode("ascii"))
    pdf.extend(
        (
            f"trailer\n<< /Size {len(objects) + 1} /Root 1 0 R >>\n"
            f"startxref\n{xref_start}\n%%EOF\n"
        ).encode("ascii")
    )
    return bytes(pdf)


def main() -> None:
    pages = [make_page(page_number) for page_number in range(1, PAGE_COUNT + 1)]
    output = Path("data") / "sample_insurance_claim_guide.pdf"
    output.parent.mkdir(parents=True, exist_ok=True)
    output.write_bytes(build_pdf(pages))
    print(output)


if __name__ == "__main__":
    main()