Spaces:
Sleeping
Sleeping
| {"messages": [{"role": "system", "content": "You are a financial document extraction expert. Your task is to:\n\n1. Identify the document type (invoice, purchase_order, receipt, or bank_statement).\n2. Extract all relevant fields into a structured JSON object following this exact schema:\n - \"common\": document_type, date, issuer (name, address), recipient (name, address), total_amount, currency\n - \"line_items\": array of {description, quantity, unit_price, amount}\n - \"type_specific\": fields specific to the document type\n - \"flags\": array of detected anomalies, each with {category, field, severity, description}\n - \"confidence_score\": your confidence in the extraction (0.0 to 1.0)\n\n3. Analyze the document for anomalies across these categories:\n - arithmetic_error: Mathematical calculations that don't add up\n - missing_field: Required fields that are absent from the document\n - format_anomaly: Inconsistent formats, negative quantities, duplicate entries\n - business_logic: Unusual amounts, suspicious patterns, round-number fraud indicators\n - cross_field: Mismatched references between related fields or documents\n\n4. If no anomalies are found, return an empty \"flags\" array.\n\nOutput ONLY valid JSON. No explanations, no markdown, no code blocks β just the raw JSON object."}, {"role": "user", "content": "Extract structured data from this financial document:\n\n---\n***********************************\n Robinson, Clark and Klein\n New Amy, SC\n***********************************\n\nDate: 11/02/2025 Time: 11:31\nReceipt #: R-770030\nCashier: Michael\n\n Future Pair 5x $52.06 $403.33\n See Unit 8x $21.09 $168.72\n Business Box 5x $37.37 $186.85\n Sort Pack 3x $36.28 $108.84\n\n Subtotal: $724.71\n Tax (8.5%): $61.60\n ββββββββββββββββββββββββββββββ\n TOTAL: $786.31\n\n Paid by: Google Pay\n\n Thank you for shopping with us!\n***********************************\n---"}, {"role": "assistant", "content": "{\"common\":{\"document_type\":\"receipt\",\"date\":\"2025-11-02\",\"issuer\":{\"name\":\"Robinson, Clark and Klein\",\"address\":\"New Amy, SC\"},\"recipient\":null,\"total_amount\":786.31,\"currency\":\"USD\"},\"line_items\":[{\"description\":\"Future Pair\",\"quantity\":5,\"unit_price\":52.06,\"amount\":403.33},{\"description\":\"See Unit\",\"quantity\":8,\"unit_price\":21.09,\"amount\":168.72},{\"description\":\"Business Box\",\"quantity\":5,\"unit_price\":37.37,\"amount\":186.85},{\"description\":\"Sort Pack\",\"quantity\":3,\"unit_price\":36.28,\"amount\":108.84}],\"type_specific\":{\"receipt_number\":\"R-770030\",\"payment_method\":\"Google Pay\",\"store_location\":\"New Amy, SC\",\"cashier\":\"Michael\"},\"flags\":[{\"category\":\"arithmetic_error\",\"field\":\"line_items[0].amount\",\"severity\":\"high\",\"description\":\"Line item 'Future Pair': amount $403.33 does not equal quantity (5) Γ unit price ($52.06) = $260.30.\"}],\"confidence_score\":0.95}"}]} | |
| {"messages": [{"role": "system", "content": "You are a financial document extraction expert. Your task is to:\n\n1. Identify the document type (invoice, purchase_order, receipt, or bank_statement).\n2. Extract all relevant fields into a structured JSON object following this exact schema:\n - \"common\": document_type, date, issuer (name, address), recipient (name, address), total_amount, currency\n - \"line_items\": array of {description, quantity, unit_price, amount}\n - \"type_specific\": fields specific to the document type\n - \"flags\": array of detected anomalies, each with {category, field, severity, description}\n - \"confidence_score\": your confidence in the extraction (0.0 to 1.0)\n\n3. Analyze the document for anomalies across these categories:\n - arithmetic_error: Mathematical calculations that don't add up\n - missing_field: Required fields that are absent from the document\n - format_anomaly: Inconsistent formats, negative quantities, duplicate entries\n - business_logic: Unusual amounts, suspicious patterns, round-number fraud indicators\n - cross_field: Mismatched references between related fields or documents\n\n4. If no anomalies are found, return an empty \"flags\" array.\n\nOutput ONLY valid JSON. No explanations, no markdown, no code blocks β just the raw JSON object."}, {"role": "user", "content": "Extract structured data from this financial document:\n\n---\n==================================================\nINVOICE\n==================================================\n\nInvoice Number: INV-2025-4155\nDate: 2026-02-17\n\nFrom:\n Moore PLC\n 7679 Perry Forge, West Russellport, TX 89431\n\nBill To:\n Tucker, Estrada and Williams\n 34324 Murphy Avenue, Burgessburgh, DC 50335\n\nββββββββββββββββββββββββββββββββββββββββββββββββββ\nItem Qty Price Amount\nββββββββββββββββββββββββββββββββββββββββββββββββββ\nIndustrial Adhesive 500ml 248 EUR 13.06 EUR 3238.88\nNitrile Gloves Box (100) 302 EUR 16.71 EUR 5046.42\nBall Bearing 6205-2RS 269 EUR 7.99 EUR 2149.31\nββββββββββββββββββββββββββββββββββββββββββββββββββ\nSubtotal: EUR 10434.61\nTax (5.0%): EUR 521.73\nTOTAL: EUR 10956.34\n\nPayment Terms: Net 90\nDue Date: 2026-03-19\n\nThank you for your business!\n---"}, {"role": "assistant", "content": "{\"common\":{\"document_type\":\"invoice\",\"date\":\"2026-02-17\",\"issuer\":{\"name\":\"Moore PLC\",\"address\":\"7679 Perry Forge, West Russellport, TX 89431\"},\"recipient\":{\"name\":\"Tucker, Estrada and Williams\",\"address\":\"34324 Murphy Avenue, Burgessburgh, DC 50335\"},\"total_amount\":10956.34,\"currency\":\"EUR\"},\"line_items\":[{\"description\":\"Industrial Adhesive 500ml\",\"quantity\":248,\"unit_price\":13.06,\"amount\":3238.88},{\"description\":\"Nitrile Gloves Box (100)\",\"quantity\":302,\"unit_price\":16.71,\"amount\":5046.42},{\"description\":\"Ball Bearing 6205-2RS\",\"quantity\":269,\"unit_price\":7.99,\"amount\":2149.31}],\"type_specific\":{\"invoice_number\":\"INV-2025-4155\",\"due_date\":\"2026-03-19\",\"payment_terms\":\"Net 90\",\"tax_amount\":521.73,\"subtotal\":10434.61},\"flags\":[],\"confidence_score\":0.95}"}]} | |
| {"messages": [{"role": "system", "content": "You are a financial document extraction expert. Your task is to:\n\n1. Identify the document type (invoice, purchase_order, receipt, or bank_statement).\n2. Extract all relevant fields into a structured JSON object following this exact schema:\n - \"common\": document_type, date, issuer (name, address), recipient (name, address), total_amount, currency\n - \"line_items\": array of {description, quantity, unit_price, amount}\n - \"type_specific\": fields specific to the document type\n - \"flags\": array of detected anomalies, each with {category, field, severity, description}\n - \"confidence_score\": your confidence in the extraction (0.0 to 1.0)\n\n3. Analyze the document for anomalies across these categories:\n - arithmetic_error: Mathematical calculations that don't add up\n - missing_field: Required fields that are absent from the document\n - format_anomaly: Inconsistent formats, negative quantities, duplicate entries\n - business_logic: Unusual amounts, suspicious patterns, round-number fraud indicators\n - cross_field: Mismatched references between related fields or documents\n\n4. If no anomalies are found, return an empty \"flags\" array.\n\nOutput ONLY valid JSON. No explanations, no markdown, no code blocks β just the raw JSON object."}, {"role": "user", "content": "Extract structured data from this financial document:\n\n---\n***********************************\n Vega-Thomas\n Lisaville, RI\n***********************************\n\nDate: 06/03/2025 Time: 21:20\nReceipt #: R-247720\nCashier: Emma\n\n Trip Unit 7x $69.39 $485.73\n\n Subtotal: $485.73\n Tax (8.5%): $41.29\n ββββββββββββββββββββββββββββββ\n TOTAL: $527.02\n\n Paid by: Visa ****4532\n\n Thank you for shopping with us!\n***********************************\n---"}, {"role": "assistant", "content": "{\"common\":{\"document_type\":\"receipt\",\"date\":\"2025-06-03\",\"issuer\":{\"name\":\"Vega-Thomas\",\"address\":\"Lisaville, RI\"},\"recipient\":null,\"total_amount\":527.02,\"currency\":\"USD\"},\"line_items\":[{\"description\":\"Trip Unit\",\"quantity\":7,\"unit_price\":69.39,\"amount\":485.73}],\"type_specific\":{\"receipt_number\":\"R-247720\",\"payment_method\":\"Visa ****4532\",\"store_location\":\"Lisaville, RI\",\"cashier\":\"Emma\"},\"flags\":[],\"confidence_score\":0.95}"}]} | |
| {"messages": [{"role": "system", "content": "You are a financial document extraction expert. Your task is to:\n\n1. Identify the document type (invoice, purchase_order, receipt, or bank_statement).\n2. Extract all relevant fields into a structured JSON object following this exact schema:\n - \"common\": document_type, date, issuer (name, address), recipient (name, address), total_amount, currency\n - \"line_items\": array of {description, quantity, unit_price, amount}\n - \"type_specific\": fields specific to the document type\n - \"flags\": array of detected anomalies, each with {category, field, severity, description}\n - \"confidence_score\": your confidence in the extraction (0.0 to 1.0)\n\n3. Analyze the document for anomalies across these categories:\n - arithmetic_error: Mathematical calculations that don't add up\n - missing_field: Required fields that are absent from the document\n - format_anomaly: Inconsistent formats, negative quantities, duplicate entries\n - business_logic: Unusual amounts, suspicious patterns, round-number fraud indicators\n - cross_field: Mismatched references between related fields or documents\n\n4. If no anomalies are found, return an empty \"flags\" array.\n\nOutput ONLY valid JSON. No explanations, no markdown, no code blocks β just the raw JSON object."}, {"role": "user", "content": "Extract structured data from this financial document:\n\n---\n\nRECEIPT\n----------------------------------------\n SAUSAGE AND SALAMI x1 62000.00\n GREEN MONSIEUR x1 57000.00\n----------------------------------------\n TOTAL: 152746.00\n---"}, {"role": "assistant", "content": "{\"common\":{\"document_type\":\"receipt\",\"date\":null,\"issuer\":{\"name\":null,\"address\":null},\"recipient\":null,\"total_amount\":152746.0,\"currency\":\"USD\"},\"line_items\":[{\"description\":\"SAUSAGE AND SALAMI\",\"quantity\":1.0,\"unit_price\":62000.0,\"amount\":62000.0},{\"description\":\"GREEN MONSIEUR\",\"quantity\":1.0,\"unit_price\":57000.0,\"amount\":57000.0}],\"type_specific\":{\"receipt_number\":\"RCP-2E8CF062\",\"payment_method\":\"unknown\",\"store_location\":null,\"cashier\":null,\"subtotal\":null,\"tax_amount\":null},\"flags\":[],\"confidence_score\":0.88}"}]} | |
| {"messages": [{"role": "system", "content": "You are a financial document extraction expert. Your task is to:\n\n1. Identify the document type (invoice, purchase_order, receipt, or bank_statement).\n2. Extract all relevant fields into a structured JSON object following this exact schema:\n - \"common\": document_type, date, issuer (name, address), recipient (name, address), total_amount, currency\n - \"line_items\": array of {description, quantity, unit_price, amount}\n - \"type_specific\": fields specific to the document type\n - \"flags\": array of detected anomalies, each with {category, field, severity, description}\n - \"confidence_score\": your confidence in the extraction (0.0 to 1.0)\n\n3. Analyze the document for anomalies across these categories:\n - arithmetic_error: Mathematical calculations that don't add up\n - missing_field: Required fields that are absent from the document\n - format_anomaly: Inconsistent formats, negative quantities, duplicate entries\n - business_logic: Unusual amounts, suspicious patterns, round-number fraud indicators\n - cross_field: Mismatched references between related fields or documents\n\n4. If no anomalies are found, return an empty \"flags\" array.\n\nOutput ONLY valid JSON. No explanations, no markdown, no code blocks β just the raw JSON object."}, {"role": "user", "content": "Extract structured data from this financial document:\n\n---\n\nRECEIPT\n----------------------------------------\n KUE PILUS ASIN x80 16800.00\n KACANG MEDAN x95 28690.00\n----------------------------------------\n TOTAL: 50039.00\n---"}, {"role": "assistant", "content": "{\"common\":{\"document_type\":\"receipt\",\"date\":null,\"issuer\":{\"name\":null,\"address\":null},\"recipient\":null,\"total_amount\":50039.0,\"currency\":\"USD\"},\"line_items\":[{\"description\":\"KUE PILUS ASIN\",\"quantity\":80.0,\"unit_price\":210.0,\"amount\":16800.0},{\"description\":\"KACANG MEDAN\",\"quantity\":95.0,\"unit_price\":302.0,\"amount\":28690.0}],\"type_specific\":{\"receipt_number\":\"RCP-AE306253\",\"payment_method\":\"unknown\",\"store_location\":null,\"cashier\":null,\"subtotal\":null,\"tax_amount\":null},\"flags\":[],\"confidence_score\":0.88}"}]} | |
| {"messages": [{"role": "system", "content": "You are a financial document extraction expert. Your task is to:\n\n1. Identify the document type (invoice, purchase_order, receipt, or bank_statement).\n2. Extract all relevant fields into a structured JSON object following this exact schema:\n - \"common\": document_type, date, issuer (name, address), recipient (name, address), total_amount, currency\n - \"line_items\": array of {description, quantity, unit_price, amount}\n - \"type_specific\": fields specific to the document type\n - \"flags\": array of detected anomalies, each with {category, field, severity, description}\n - \"confidence_score\": your confidence in the extraction (0.0 to 1.0)\n\n3. Analyze the document for anomalies across these categories:\n - arithmetic_error: Mathematical calculations that don't add up\n - missing_field: Required fields that are absent from the document\n - format_anomaly: Inconsistent formats, negative quantities, duplicate entries\n - business_logic: Unusual amounts, suspicious patterns, round-number fraud indicators\n - cross_field: Mismatched references between related fields or documents\n\n4. If no anomalies are found, return an empty \"flags\" array.\n\nOutput ONLY valid JSON. No explanations, no markdown, no code blocks β just the raw JSON object."}, {"role": "user", "content": "Extract structured data from this financial document:\n\n---\n***********************************\n Holmes's Hardware\n North Tinamouth, CT\n***********************************\n\nDate: 07/13/2025 Time: 8:45\nReceipt #: R-630041\nCashier: Mary\n\n Score Pair 3x $14.82 $44.46\n Near Pair 8x $11.06 $88.48\n Property Set 10x $40.12 $401.20\n Themselves Pair 8x $65.07 $520.56\n Eye Box 3x $42.02 $126.06\n\n Subtotal: $1180.76\n Tax (8.5%): $100.36\n ββββββββββββββββββββββββββββββ\n TOTAL: $1281.12\n\n Paid by: Wire Transfer\n\n Thank you for shopping with us!\n***********************************\n---"}, {"role": "assistant", "content": "{\"common\":{\"document_type\":\"receipt\",\"date\":\"2025-07-13\",\"issuer\":{\"name\":\"Holmes's Hardware\",\"address\":\"North Tinamouth, CT\"},\"recipient\":null,\"total_amount\":1281.12,\"currency\":\"USD\"},\"line_items\":[{\"description\":\"Score Pair\",\"quantity\":3,\"unit_price\":14.82,\"amount\":44.46},{\"description\":\"Near Pair\",\"quantity\":8,\"unit_price\":11.06,\"amount\":88.48},{\"description\":\"Property Set\",\"quantity\":10,\"unit_price\":40.12,\"amount\":401.2},{\"description\":\"Themselves Pair\",\"quantity\":8,\"unit_price\":65.07,\"amount\":520.56},{\"description\":\"Eye Box\",\"quantity\":3,\"unit_price\":42.02,\"amount\":126.06}],\"type_specific\":{\"receipt_number\":\"R-630041\",\"payment_method\":\"Wire Transfer\",\"store_location\":\"North Tinamouth, CT\",\"cashier\":\"Mary\"},\"flags\":[],\"confidence_score\":0.95}"}]} | |
| {"messages": [{"role": "system", "content": "You are a financial document extraction expert. Your task is to:\n\n1. Identify the document type (invoice, purchase_order, receipt, or bank_statement).\n2. Extract all relevant fields into a structured JSON object following this exact schema:\n - \"common\": document_type, date, issuer (name, address), recipient (name, address), total_amount, currency\n - \"line_items\": array of {description, quantity, unit_price, amount}\n - \"type_specific\": fields specific to the document type\n - \"flags\": array of detected anomalies, each with {category, field, severity, description}\n - \"confidence_score\": your confidence in the extraction (0.0 to 1.0)\n\n3. Analyze the document for anomalies across these categories:\n - arithmetic_error: Mathematical calculations that don't add up\n - missing_field: Required fields that are absent from the document\n - format_anomaly: Inconsistent formats, negative quantities, duplicate entries\n - business_logic: Unusual amounts, suspicious patterns, round-number fraud indicators\n - cross_field: Mismatched references between related fields or documents\n\n4. If no anomalies are found, return an empty \"flags\" array.\n\nOutput ONLY valid JSON. No explanations, no markdown, no code blocks β just the raw JSON object."}, {"role": "user", "content": "Extract structured data from this financial document:\n\n---\n==================================================\nINVOICE\n==================================================\n\nInvoice Number: INV-2020-5082\nDate: 11-Jun-2024\n\nFrom:\n Good-Hodges\n 2400 Flynn Hill, Nicholasberg, IA 70122\n\nBill To:\n Knight-Sparks\n 693 Durham Divide, South Dana, NH 33967\n\nββββββββββββββββββββββββββββββββββββββββββββββββββ\nItem Qty Price Amount\nββββββββββββββββββββββββββββββββββββββββββββββββββ\nHydraulic Cylinder 3in Bore 411 CAD 194.19 CAD 79812.09\nCopper Wire 14AWG (100ft) 272 CAD 89.08 CAD 24229.76\nββββββββββββββββββββββββββββββββββββββββββββββββββ\nSubtotal: CAD 104041.85\nTax (13.0%): CAD 13525.44\nTOTAL: CAD 117567.29\n\nPayment Terms: COD\nDue Date: 2024-09-09\n\nThank you for your business!\n Hydraulic Cylinder 3in Bore (DUPLICATE ENTRY)\n---"}, {"role": "assistant", "content": "{\"common\":{\"document_type\":\"invoice\",\"date\":\"2024-06-11\",\"issuer\":{\"name\":\"Good-Hodges\",\"address\":\"2400 Flynn Hill, Nicholasberg, IA 70122\"},\"recipient\":{\"name\":\"Knight-Sparks\",\"address\":\"693 Durham Divide, South Dana, NH 33967\"},\"total_amount\":117567.29,\"currency\":\"CAD\"},\"line_items\":[{\"description\":\"Hydraulic Cylinder 3in Bore\",\"quantity\":411,\"unit_price\":194.19,\"amount\":79812.09},{\"description\":\"Copper Wire 14AWG (100ft)\",\"quantity\":272,\"unit_price\":89.08,\"amount\":24229.76},{\"description\":\"Hydraulic Cylinder 3in Bore\",\"quantity\":411,\"unit_price\":194.19,\"amount\":79812.09}],\"type_specific\":{\"invoice_number\":null,\"due_date\":null,\"payment_terms\":\"COD\",\"tax_amount\":13525.44,\"subtotal\":104041.85},\"flags\":[{\"category\":\"missing_field\",\"field\":\"type_specific.due_date\",\"severity\":\"medium\",\"description\":\"Required field 'Due Date' is missing from the document.\"},{\"category\":\"missing_field\",\"field\":\"type_specific.invoice_number\",\"severity\":\"medium\",\"description\":\"Required field 'Invoice Number' is missing from the document.\"},{\"category\":\"format_anomaly\",\"field\":\"line_items\",\"severity\":\"low\",\"description\":\"Duplicate line item detected: 'Hydraulic Cylinder 3in Bore' appears more than once.\"}],\"confidence_score\":0.95}"}]} | |
| {"messages": [{"role": "system", "content": "You are a financial document extraction expert. Your task is to:\n\n1. Identify the document type (invoice, purchase_order, receipt, or bank_statement).\n2. Extract all relevant fields into a structured JSON object following this exact schema:\n - \"common\": document_type, date, issuer (name, address), recipient (name, address), total_amount, currency\n - \"line_items\": array of {description, quantity, unit_price, amount}\n - \"type_specific\": fields specific to the document type\n - \"flags\": array of detected anomalies, each with {category, field, severity, description}\n - \"confidence_score\": your confidence in the extraction (0.0 to 1.0)\n\n3. Analyze the document for anomalies across these categories:\n - arithmetic_error: Mathematical calculations that don't add up\n - missing_field: Required fields that are absent from the document\n - format_anomaly: Inconsistent formats, negative quantities, duplicate entries\n - business_logic: Unusual amounts, suspicious patterns, round-number fraud indicators\n - cross_field: Mismatched references between related fields or documents\n\n4. If no anomalies are found, return an empty \"flags\" array.\n\nOutput ONLY valid JSON. No explanations, no markdown, no code blocks β just the raw JSON object."}, {"role": "user", "content": "Extract structured data from this financial document:\n\n---\n\nRECEIPT\n----------------------------------------\n BUDS CALMING TUMMY RUB CREAM 30ML(TOILETRIES) x1 200000.00\n BUDS PRECIOUS NEWBORN CREAM 75ML(TOILETRIES) x1 235000.00\n BUDS PRECIOUS NEWBORN HEAD TO TOE CLEANSER 250ML(TOILETRIES) x1 270000.00\n----------------------------------------\n TOTAL: 705000.00\n---"}, {"role": "assistant", "content": "{\"common\":{\"document_type\":\"receipt\",\"date\":null,\"issuer\":{\"name\":null,\"address\":null},\"recipient\":null,\"total_amount\":705000.0,\"currency\":\"USD\"},\"line_items\":[{\"description\":\"BUDS CALMING TUMMY RUB CREAM 30ML(TOILETRIES)\",\"quantity\":1.0,\"unit_price\":200000.0,\"amount\":200000.0},{\"description\":\"BUDS PRECIOUS NEWBORN CREAM 75ML(TOILETRIES)\",\"quantity\":1.0,\"unit_price\":235000.0,\"amount\":235000.0},{\"description\":\"BUDS PRECIOUS NEWBORN HEAD TO TOE CLEANSER 250ML(TOILETRIES)\",\"quantity\":1.0,\"unit_price\":270000.0,\"amount\":270000.0}],\"type_specific\":{\"receipt_number\":\"RCP-24668D7D\",\"payment_method\":\"unknown\",\"store_location\":null,\"cashier\":null,\"subtotal\":null,\"tax_amount\":null},\"flags\":[],\"confidence_score\":0.88}"}]} | |
| {"messages": [{"role": "system", "content": "You are a financial document extraction expert. Your task is to:\n\n1. Identify the document type (invoice, purchase_order, receipt, or bank_statement).\n2. Extract all relevant fields into a structured JSON object following this exact schema:\n - \"common\": document_type, date, issuer (name, address), recipient (name, address), total_amount, currency\n - \"line_items\": array of {description, quantity, unit_price, amount}\n - \"type_specific\": fields specific to the document type\n - \"flags\": array of detected anomalies, each with {category, field, severity, description}\n - \"confidence_score\": your confidence in the extraction (0.0 to 1.0)\n\n3. Analyze the document for anomalies across these categories:\n - arithmetic_error: Mathematical calculations that don't add up\n - missing_field: Required fields that are absent from the document\n - format_anomaly: Inconsistent formats, negative quantities, duplicate entries\n - business_logic: Unusual amounts, suspicious patterns, round-number fraud indicators\n - cross_field: Mismatched references between related fields or documents\n\n4. If no anomalies are found, return an empty \"flags\" array.\n\nOutput ONLY valid JSON. No explanations, no markdown, no code blocks β just the raw JSON object."}, {"role": "user", "content": "Extract structured data from this financial document:\n\n---\n==================================================\nINVOICE\n==================================================\n\nInvoice Number: INV-2022-1722\nDate: 10/07/2024\n\nFrom:\n Terry LLC\n 058 Stanley Cliff, Alexanderchester, GU 64752\n\nBill To:\n Hamilton LLC\n 262 Gilbert Plain Suite 769, Marcusbury, MP 11860\n\nββββββββββββββββββββββββββββββββββββββββββββββββββ\nItem Qty Price Amount\nββββββββββββββββββββββββββββββββββββββββββββββββββ\nAnnual Software License 18 USD 734.10 USD 13213.80\nMarketing Campaign Management 18 USD 2058.02 USD 37044.36\nGraphic Design Services 3 USD 191.76 USD 575.28\nββββββββββββββββββββββββββββββββββββββββββββββββββ\nSubtotal: USD 50833.44\nTax (6.0%): USD 3050.01\nTOTAL: USD 1135367.93\n\nPayment Terms: Due on Receipt\nDue Date: 2025-01-05\n\nThank you for your business!\n---"}, {"role": "assistant", "content": "{\"common\":{\"document_type\":\"invoice\",\"date\":\"2024-10-07\",\"issuer\":{\"name\":\"Terry LLC\",\"address\":\"058 Stanley Cliff, Alexanderchester, GU 64752\"},\"recipient\":{\"name\":\"Hamilton LLC\",\"address\":\"262 Gilbert Plain Suite 769, Marcusbury, MP 11860\"},\"total_amount\":1135367.93,\"currency\":\"USD\"},\"line_items\":[{\"description\":\"Annual Software License\",\"quantity\":18,\"unit_price\":734.1,\"amount\":13213.8},{\"description\":\"Marketing Campaign Management\",\"quantity\":18,\"unit_price\":2058.02,\"amount\":37044.36},{\"description\":\"Graphic Design Services\",\"quantity\":3,\"unit_price\":191.76,\"amount\":575.28}],\"type_specific\":{\"invoice_number\":\"INV-2022-1722\",\"due_date\":\"2025-01-05\",\"payment_terms\":\"Due on Receipt\",\"tax_amount\":3050.01,\"subtotal\":50833.44},\"flags\":[{\"category\":\"business_logic\",\"field\":\"common.total_amount\",\"severity\":\"high\",\"description\":\"Total amount $1,135,367.93 is unusually large and may require additional review.\"}],\"confidence_score\":0.95}"}]} | |
| {"messages": [{"role": "system", "content": "You are a financial document extraction expert. Your task is to:\n\n1. Identify the document type (invoice, purchase_order, receipt, or bank_statement).\n2. Extract all relevant fields into a structured JSON object following this exact schema:\n - \"common\": document_type, date, issuer (name, address), recipient (name, address), total_amount, currency\n - \"line_items\": array of {description, quantity, unit_price, amount}\n - \"type_specific\": fields specific to the document type\n - \"flags\": array of detected anomalies, each with {category, field, severity, description}\n - \"confidence_score\": your confidence in the extraction (0.0 to 1.0)\n\n3. Analyze the document for anomalies across these categories:\n - arithmetic_error: Mathematical calculations that don't add up\n - missing_field: Required fields that are absent from the document\n - format_anomaly: Inconsistent formats, negative quantities, duplicate entries\n - business_logic: Unusual amounts, suspicious patterns, round-number fraud indicators\n - cross_field: Mismatched references between related fields or documents\n\n4. If no anomalies are found, return an empty \"flags\" array.\n\nOutput ONLY valid JSON. No explanations, no markdown, no code blocks β just the raw JSON object."}, {"role": "user", "content": "Extract structured data from this financial document:\n\n---\nPURCHASE ORDER\n\nPO Number: PO-2023-54681\nDate: 2024-09-09\n\nBuyer: Thomas, Hoover and Nicholson\nAddress: USNV Martin, FPO AE 74778\n\nSupplier: Rhodes-Fernandez\nAddress: 185 Connor Fort, West Cassiefurt, WI 43279\n\nItems Ordered:\nββββββββββββββββββββββββββββββββββββββββββββββββββ\n 1. Safety Goggles ANSI Z87 - Qty: 167 @ USD 10.87 = USD 1815.29\n 2. LED Panel Light 60W - Qty: 291 @ USD 31.68 = USD 9218.88\n 3. Ball Bearing 6205-2RS - Qty: 453 @ USD 3.26 = USD 1476.78\nββββββββββββββββββββββββββββββββββββββββββββββββββ\nTotal: USD 12510.95\n\nDelivery Date: 2024-09-26\nShip To: 7358 Reynolds Springs Suite 194, Lake Charles, KY 20715\n\nAuthorized By: Richard Moore\n---"}, {"role": "assistant", "content": "{\"common\":{\"document_type\":\"purchase_order\",\"date\":\"2024-09-09\",\"issuer\":{\"name\":\"Thomas, Hoover and Nicholson\",\"address\":\"USNV Martin, FPO AE 74778\"},\"recipient\":{\"name\":\"Rhodes-Fernandez\",\"address\":\"185 Connor Fort, West Cassiefurt, WI 43279\"},\"total_amount\":12510.95,\"currency\":\"USD\"},\"line_items\":[{\"description\":\"Safety Goggles ANSI Z87\",\"quantity\":167,\"unit_price\":10.87,\"amount\":1815.29},{\"description\":\"LED Panel Light 60W\",\"quantity\":291,\"unit_price\":31.68,\"amount\":9218.88},{\"description\":\"Ball Bearing 6205-2RS\",\"quantity\":453,\"unit_price\":3.26,\"amount\":1476.78}],\"type_specific\":{\"po_number\":\"PO-2023-54681\",\"delivery_date\":\"2024-09-26\",\"shipping_address\":\"7358 Reynolds Springs Suite 194, Lake Charles, KY 20715\",\"referenced_invoice\":null},\"flags\":[],\"confidence_score\":0.95}"}]} | |
| {"messages": [{"role": "system", "content": "You are a financial document extraction expert. Your task is to:\n\n1. Identify the document type (invoice, purchase_order, receipt, or bank_statement).\n2. Extract all relevant fields into a structured JSON object following this exact schema:\n - \"common\": document_type, date, issuer (name, address), recipient (name, address), total_amount, currency\n - \"line_items\": array of {description, quantity, unit_price, amount}\n - \"type_specific\": fields specific to the document type\n - \"flags\": array of detected anomalies, each with {category, field, severity, description}\n - \"confidence_score\": your confidence in the extraction (0.0 to 1.0)\n\n3. Analyze the document for anomalies across these categories:\n - arithmetic_error: Mathematical calculations that don't add up\n - missing_field: Required fields that are absent from the document\n - format_anomaly: Inconsistent formats, negative quantities, duplicate entries\n - business_logic: Unusual amounts, suspicious patterns, round-number fraud indicators\n - cross_field: Mismatched references between related fields or documents\n\n4. If no anomalies are found, return an empty \"flags\" array.\n\nOutput ONLY valid JSON. No explanations, no markdown, no code blocks β just the raw JSON object."}, {"role": "user", "content": "Extract structured data from this financial document:\n\n---\n==================================================\nINVOICE\n==================================================\n\nInvoice Number: INV-2022-4020\nDate: 31-May-2024\n\nFrom:\n Schultz, Schmidt and Lee\n 231 Jonathan Extensions Apt. 127, Webstershire, NV 89778\n\nBill To:\n Day, Downs and Morris\n Unit 5271 Box 7744, DPO AP 07863\n\nββββββββββββββββββββββββββββββββββββββββββββββββββ\nItem Qty Price Amount\nββββββββββββββββββββββββββββββββββββββββββββββββββ\nPVC Pipe 2in x 10ft 252 AUD 4.04 AUD 1018.08\nHydraulic Cylinder 3in Bore 323 AUD 246.32 AUD 79561.36\nRubber O-Ring Seals 78 AUD 0.46 AUD 35.88\nWelding Rod E6013 (5lb) 289 AUD 21.19 AUD 6123.91\nStainless Steel Sheet 4x8ft 44 AUD 188.12 AUD 8277.28\nββββββββββββββββββββββββββββββββββββββββββββββββββ\nSubtotal: AUD 95016.51\nTax (5.0%): AUD 4750.83\nTOTAL: AUD 99767.34\n\nPayment Terms: Net 60\nDue Date: 2024-08-29\n\nThank you for your business!\n---"}, {"role": "assistant", "content": "{\"common\":{\"document_type\":\"invoice\",\"date\":\"2024-05-31\",\"issuer\":{\"name\":\"Schultz, Schmidt and Lee\",\"address\":\"231 Jonathan Extensions Apt. 127, Webstershire, NV 89778\"},\"recipient\":{\"name\":\"Day, Downs and Morris\",\"address\":\"Unit 5271 Box 7744, DPO AP 07863\"},\"total_amount\":99767.34,\"currency\":\"AUD\"},\"line_items\":[{\"description\":\"PVC Pipe 2in x 10ft\",\"quantity\":252,\"unit_price\":4.04,\"amount\":1018.08},{\"description\":\"Hydraulic Cylinder 3in Bore\",\"quantity\":323,\"unit_price\":246.32,\"amount\":79561.36},{\"description\":\"Rubber O-Ring Seals\",\"quantity\":78,\"unit_price\":0.46,\"amount\":35.88},{\"description\":\"Welding Rod E6013 (5lb)\",\"quantity\":289,\"unit_price\":21.19,\"amount\":6123.91},{\"description\":\"Stainless Steel Sheet 4x8ft\",\"quantity\":44,\"unit_price\":188.12,\"amount\":8277.28}],\"type_specific\":{\"invoice_number\":\"INV-2022-4020\",\"due_date\":\"2024-08-29\",\"payment_terms\":\"Net 60\",\"tax_amount\":4750.83,\"subtotal\":95016.51},\"flags\":[],\"confidence_score\":0.95}"}]} | |
| {"messages": [{"role": "system", "content": "You are a financial document extraction expert. Your task is to:\n\n1. Identify the document type (invoice, purchase_order, receipt, or bank_statement).\n2. Extract all relevant fields into a structured JSON object following this exact schema:\n - \"common\": document_type, date, issuer (name, address), recipient (name, address), total_amount, currency\n - \"line_items\": array of {description, quantity, unit_price, amount}\n - \"type_specific\": fields specific to the document type\n - \"flags\": array of detected anomalies, each with {category, field, severity, description}\n - \"confidence_score\": your confidence in the extraction (0.0 to 1.0)\n\n3. Analyze the document for anomalies across these categories:\n - arithmetic_error: Mathematical calculations that don't add up\n - missing_field: Required fields that are absent from the document\n - format_anomaly: Inconsistent formats, negative quantities, duplicate entries\n - business_logic: Unusual amounts, suspicious patterns, round-number fraud indicators\n - cross_field: Mismatched references between related fields or documents\n\n4. If no anomalies are found, return an empty \"flags\" array.\n\nOutput ONLY valid JSON. No explanations, no markdown, no code blocks β just the raw JSON object."}, {"role": "user", "content": "Extract structured data from this financial document:\n\n---\nPURCHASE ORDER\n\nPO Number: PO-2022-39552\nDate: 2025-07-22\n\nBuyer: Smith and Sons\nAddress: 2175 Mark Harbors Apt. 838, Powellbury, CO 60675\n\nSupplier: Solis-West\nAddress: 1701 Gutierrez Point Apt. 199, Pamelaside, IL 86300\n\nItems Ordered:\nββββββββββββββββββββββββββββββββββββββββββββββββββ\n 1. Air Filter Cartridge - Qty: 141 @ CAD 6.28 = CAD 885.48\n 2. LED Panel Light 60W - Qty: 156 @ CAD 32.28 = CAD 5035.68\n 3. Rubber O-Ring Seals - Qty: 48 @ CAD 1.27 = CAD 60.96\n 4. PVC Pipe 2in x 10ft - Qty: 87 @ CAD 9.89 = CAD 860.43\n 5. Industrial Adhesive 500ml - Qty: 298 @ CAD 11.04 = CAD 3289.92\n 6. Nitrile Gloves Box (100) - Qty: 338 @ CAD 12.49 = CAD 4221.62\nββββββββββββββββββββββββββββββββββββββββββββββββββ\nTotal: CAD 14354.09\n\nDelivery Date: 2025-09-05\nShip To: 0375 Moran Lakes Suite 113, West Michaelport, OH 68918\n\nAuthorized By: Brian Prince\n---"}, {"role": "assistant", "content": "{\"common\":{\"document_type\":\"purchase_order\",\"date\":\"2025-07-22\",\"issuer\":{\"name\":\"Smith and Sons\",\"address\":\"2175 Mark Harbors Apt. 838, Powellbury, CO 60675\"},\"recipient\":{\"name\":\"Solis-West\",\"address\":\"1701 Gutierrez Point Apt. 199, Pamelaside, IL 86300\"},\"total_amount\":14354.09,\"currency\":\"CAD\"},\"line_items\":[{\"description\":\"Air Filter Cartridge\",\"quantity\":141,\"unit_price\":6.28,\"amount\":885.48},{\"description\":\"LED Panel Light 60W\",\"quantity\":156,\"unit_price\":32.28,\"amount\":5035.68},{\"description\":\"Rubber O-Ring Seals\",\"quantity\":48,\"unit_price\":1.27,\"amount\":60.96},{\"description\":\"PVC Pipe 2in x 10ft\",\"quantity\":87,\"unit_price\":9.89,\"amount\":860.43},{\"description\":\"Industrial Adhesive 500ml\",\"quantity\":298,\"unit_price\":11.04,\"amount\":3289.92},{\"description\":\"Nitrile Gloves Box (100)\",\"quantity\":338,\"unit_price\":12.49,\"amount\":4221.62}],\"type_specific\":{\"po_number\":\"PO-2022-39552\",\"delivery_date\":\"2025-09-05\",\"shipping_address\":\"0375 Moran Lakes Suite 113, West Michaelport, OH 68918\",\"referenced_invoice\":null},\"flags\":[],\"confidence_score\":0.95}"}]} | |
| {"messages": [{"role": "system", "content": "You are a financial document extraction expert. Your task is to:\n\n1. Identify the document type (invoice, purchase_order, receipt, or bank_statement).\n2. Extract all relevant fields into a structured JSON object following this exact schema:\n - \"common\": document_type, date, issuer (name, address), recipient (name, address), total_amount, currency\n - \"line_items\": array of {description, quantity, unit_price, amount}\n - \"type_specific\": fields specific to the document type\n - \"flags\": array of detected anomalies, each with {category, field, severity, description}\n - \"confidence_score\": your confidence in the extraction (0.0 to 1.0)\n\n3. Analyze the document for anomalies across these categories:\n - arithmetic_error: Mathematical calculations that don't add up\n - missing_field: Required fields that are absent from the document\n - format_anomaly: Inconsistent formats, negative quantities, duplicate entries\n - business_logic: Unusual amounts, suspicious patterns, round-number fraud indicators\n - cross_field: Mismatched references between related fields or documents\n\n4. If no anomalies are found, return an empty \"flags\" array.\n\nOutput ONLY valid JSON. No explanations, no markdown, no code blocks β just the raw JSON object."}, {"role": "user", "content": "Extract structured data from this financial document:\n\n---\n============================================================\n First National Bank\n ACCOUNT STATEMENT\n============================================================\n\nAccount Holder: Jacqueline Stewart\nAccount Number: ****3807\nStatement Period: 04/02/2026 - 05/02/2026\n\nOpening Balance: $25,493.06\n\nDate Description Amount Balance\nββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ\n2026-04-03 Client Payment $9,259.57 $51,644.47\n2026-04-05 Deposit $2,336.18 $37,603.67\n2026-04-12 Wire Transfer - Johnston $-3,763.75 $43,236.18\n2026-04-15 Utility Payment - Electric $-3,029.79 $41,030.86\n2026-04-20 Utility Payment - Electric $-1,925.39 $44,338.60\n2026-04-21 Interest Payment $9,690.76 $50,721.62\n2026-04-22 Refund - Ramirez, Walker and Ray $9,774.43 $35,267.49\n2026-04-24 Deposit $3,027.81 $46,263.99\n2026-04-25 Utility Payment - Electric $-277.95 $44,060.65\n2026-04-30 Refund - Bryant Inc $4,781.23 $42,384.90\n2026-04-30 Wire Transfer - Diaz $-4,644.54 $46,999.93\nββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ\n\nClosing Balance: $46,999.93\n\nTotal Debits: $-13,641.42\nTotal Credits: $38,869.98\n---"}, {"role": "assistant", "content": "{\"common\":{\"document_type\":\"bank_statement\",\"date\":\"2026-05-02\",\"issuer\":{\"name\":\"First National Bank\",\"address\":null},\"recipient\":{\"name\":\"Jacqueline Stewart\",\"address\":null},\"total_amount\":46999.93,\"currency\":\"USD\"},\"line_items\":[{\"description\":\"Client Payment\",\"quantity\":1,\"unit_price\":9259.57,\"amount\":9259.57},{\"description\":\"Deposit\",\"quantity\":1,\"unit_price\":2336.18,\"amount\":2336.18},{\"description\":\"Wire Transfer - Johnston\",\"quantity\":1,\"unit_price\":-3763.75,\"amount\":-3763.75},{\"description\":\"Utility Payment - Electric\",\"quantity\":1,\"unit_price\":-3029.79,\"amount\":-3029.79},{\"description\":\"Utility Payment - Electric\",\"quantity\":1,\"unit_price\":-1925.39,\"amount\":-1925.39},{\"description\":\"Interest Payment\",\"quantity\":1,\"unit_price\":9690.76,\"amount\":9690.76},{\"description\":\"Refund - Ramirez, Walker and Ray\",\"quantity\":1,\"unit_price\":9774.43,\"amount\":9774.43},{\"description\":\"Deposit\",\"quantity\":1,\"unit_price\":3027.81,\"amount\":3027.81},{\"description\":\"Utility Payment - Electric\",\"quantity\":1,\"unit_price\":-277.95,\"amount\":-277.95},{\"description\":\"Refund - Bryant Inc\",\"quantity\":1,\"unit_price\":4781.23,\"amount\":4781.23},{\"description\":\"Wire Transfer - Diaz\",\"quantity\":1,\"unit_price\":-4644.54,\"amount\":-4644.54}],\"type_specific\":{\"account_number\":\"****3807\",\"statement_period\":\"2026-04-02 to 2026-05-02\",\"opening_balance\":25493.06,\"closing_balance\":46999.93},\"flags\":[],\"confidence_score\":0.95}"}]} | |
| {"messages": [{"role": "system", "content": "You are a financial document extraction expert. Your task is to:\n\n1. Identify the document type (invoice, purchase_order, receipt, or bank_statement).\n2. Extract all relevant fields into a structured JSON object following this exact schema:\n - \"common\": document_type, date, issuer (name, address), recipient (name, address), total_amount, currency\n - \"line_items\": array of {description, quantity, unit_price, amount}\n - \"type_specific\": fields specific to the document type\n - \"flags\": array of detected anomalies, each with {category, field, severity, description}\n - \"confidence_score\": your confidence in the extraction (0.0 to 1.0)\n\n3. Analyze the document for anomalies across these categories:\n - arithmetic_error: Mathematical calculations that don't add up\n - missing_field: Required fields that are absent from the document\n - format_anomaly: Inconsistent formats, negative quantities, duplicate entries\n - business_logic: Unusual amounts, suspicious patterns, round-number fraud indicators\n - cross_field: Mismatched references between related fields or documents\n\n4. If no anomalies are found, return an empty \"flags\" array.\n\nOutput ONLY valid JSON. No explanations, no markdown, no code blocks β just the raw JSON object."}, {"role": "user", "content": "Extract structured data from this financial document:\n\n---\n\nRECEIPT\n----------------------------------------\n 1069-ROTI KEJU COKLAT x1 8.50\n 1037-ROTI MAHKOTA/RING x1 10.50\n 1022-ROTI KACANG MERAH x1 8.00\n 1001-ROTI COKLAT x1 8.00\n----------------------------------------\n TOTAL: 35.00\n---"}, {"role": "assistant", "content": "{\"common\":{\"document_type\":\"receipt\",\"date\":null,\"issuer\":{\"name\":null,\"address\":null},\"recipient\":null,\"total_amount\":35.0,\"currency\":\"USD\"},\"line_items\":[{\"description\":\"1069-ROTI KEJU COKLAT\",\"quantity\":1,\"unit_price\":8.5,\"amount\":8.5},{\"description\":\"1037-ROTI MAHKOTA/RING\",\"quantity\":1,\"unit_price\":10.5,\"amount\":10.5},{\"description\":\"1022-ROTI KACANG MERAH\",\"quantity\":1,\"unit_price\":8.0,\"amount\":8.0},{\"description\":\"1001-ROTI COKLAT\",\"quantity\":1,\"unit_price\":8.0,\"amount\":8.0}],\"type_specific\":{\"receipt_number\":\"RCP-08E70664\",\"payment_method\":\"unknown\",\"store_location\":null,\"cashier\":null,\"subtotal\":null,\"tax_amount\":null},\"flags\":[],\"confidence_score\":0.88}"}]} | |
| {"messages": [{"role": "system", "content": "You are a financial document extraction expert. Your task is to:\n\n1. Identify the document type (invoice, purchase_order, receipt, or bank_statement).\n2. Extract all relevant fields into a structured JSON object following this exact schema:\n - \"common\": document_type, date, issuer (name, address), recipient (name, address), total_amount, currency\n - \"line_items\": array of {description, quantity, unit_price, amount}\n - \"type_specific\": fields specific to the document type\n - \"flags\": array of detected anomalies, each with {category, field, severity, description}\n - \"confidence_score\": your confidence in the extraction (0.0 to 1.0)\n\n3. Analyze the document for anomalies across these categories:\n - arithmetic_error: Mathematical calculations that don't add up\n - missing_field: Required fields that are absent from the document\n - format_anomaly: Inconsistent formats, negative quantities, duplicate entries\n - business_logic: Unusual amounts, suspicious patterns, round-number fraud indicators\n - cross_field: Mismatched references between related fields or documents\n\n4. If no anomalies are found, return an empty \"flags\" array.\n\nOutput ONLY valid JSON. No explanations, no markdown, no code blocks β just the raw JSON object."}, {"role": "user", "content": "Extract structured data from this financial document:\n\n---\n==================================================\nINVOICE\n==================================================\n\nInvoice Number: INV-2020-3496\nDate: 09-Sep-2024\n\nFrom:\n Robinson, Kane and Murphy\n 2284 Daniel Way, East Alexisbury, OR 42938\n\nBill To:\n Thompson Inc\n 117 Strickland Passage Apt. 783, Lake Kristahaven, ME 60537\n\nββββββββββββββββββββββββββββββββββββββββββββββββββ\nItem Qty Price Amount\nββββββββββββββββββββββββββββββββββββββββββββββββββ\nLegal Advisory - Contract Review 20 USD 487.31 USD 9746.20\nCloud Infrastructure Setup 4 USD 2444.48 USD 9777.92\nGraphic Design Services 6 USD 194.71 USD 1168.26\nEquipment Maintenance - Quarterly 4 USD 547.21 USD 2188.84\nMarketing Campaign Management 10 USD 5454.78 USD 54547.80\nStaff Training Session 13 USD 775.99 USD 10087.87\nSecurity Audit 7 USD 1303.99 USD 9127.93\nAnnual Software License 8 USD 190.71 USD 1525.68\nββββββββββββββββββββββββββββββββββββββββββββββββββ\nSubtotal: USD 98170.50\nTax (7.5%): USD 7362.79\nTOTAL: USD 105533.29\n\nPayment Terms: Net 15\nDue Date: 2024-10-24\n\nThank you for your business!\n---"}, {"role": "assistant", "content": "{\"common\":{\"document_type\":\"invoice\",\"date\":\"2024-09-09\",\"issuer\":{\"name\":\"Robinson, Kane and Murphy\",\"address\":\"2284 Daniel Way, East Alexisbury, OR 42938\"},\"recipient\":{\"name\":\"Thompson Inc\",\"address\":\"117 Strickland Passage Apt. 783, Lake Kristahaven, ME 60537\"},\"total_amount\":105533.29,\"currency\":\"USD\"},\"line_items\":[{\"description\":\"Legal Advisory - Contract Review\",\"quantity\":20,\"unit_price\":487.31,\"amount\":9746.2},{\"description\":\"Cloud Infrastructure Setup\",\"quantity\":4,\"unit_price\":2444.48,\"amount\":9777.92},{\"description\":\"Graphic Design Services\",\"quantity\":6,\"unit_price\":194.71,\"amount\":1168.26},{\"description\":\"Equipment Maintenance - Quarterly\",\"quantity\":4,\"unit_price\":547.21,\"amount\":2188.84},{\"description\":\"Marketing Campaign Management\",\"quantity\":10,\"unit_price\":5454.78,\"amount\":54547.8},{\"description\":\"Staff Training Session\",\"quantity\":13,\"unit_price\":775.99,\"amount\":10087.87},{\"description\":\"Security Audit\",\"quantity\":7,\"unit_price\":1303.99,\"amount\":9127.93},{\"description\":\"Annual Software License\",\"quantity\":8,\"unit_price\":190.71,\"amount\":1525.68}],\"type_specific\":{\"invoice_number\":\"INV-2020-3496\",\"due_date\":\"2024-10-24\",\"payment_terms\":\"Net 15\",\"tax_amount\":7362.79,\"subtotal\":98170.5},\"flags\":[],\"confidence_score\":0.95}"}]} | |
| {"messages": [{"role": "system", "content": "You are a financial document extraction expert. Your task is to:\n\n1. Identify the document type (invoice, purchase_order, receipt, or bank_statement).\n2. Extract all relevant fields into a structured JSON object following this exact schema:\n - \"common\": document_type, date, issuer (name, address), recipient (name, address), total_amount, currency\n - \"line_items\": array of {description, quantity, unit_price, amount}\n - \"type_specific\": fields specific to the document type\n - \"flags\": array of detected anomalies, each with {category, field, severity, description}\n - \"confidence_score\": your confidence in the extraction (0.0 to 1.0)\n\n3. Analyze the document for anomalies across these categories:\n - arithmetic_error: Mathematical calculations that don't add up\n - missing_field: Required fields that are absent from the document\n - format_anomaly: Inconsistent formats, negative quantities, duplicate entries\n - business_logic: Unusual amounts, suspicious patterns, round-number fraud indicators\n - cross_field: Mismatched references between related fields or documents\n\n4. If no anomalies are found, return an empty \"flags\" array.\n\nOutput ONLY valid JSON. No explanations, no markdown, no code blocks β just the raw JSON object."}, {"role": "user", "content": "Extract structured data from this financial document:\n\n---\n==================================================\nINVOICE\n==================================================\n\nInvoice Number: INV-2023-3591\nDate: May 25, 2024\n\nFrom:\n Jones, Hensley and Powell\n USNS Riley, FPO AE 81684\n\nBill To:\n Barnes PLC\n 3534 Anderson Rue Suite 751, South Christineshire, CT 10103\n\nββββββββββββββββββββββββββββββββββββββββββββββββββ\nItem Qty Price Amount\nββββββββββββββββββββββββββββββββββββββββββββββββββ\nAir Filter Cartridge 446 USD 13.41 USD 5980.86\nWelding Rod E6013 (5lb) 431 USD 16.22 USD 6990.82\nCopper Wire 14AWG (100ft) 312 USD 29.92 USD 9335.04\nNitrile Gloves Box (100) 192 USD 14.14 USD 2714.88\nRubber O-Ring Seals 277 USD 2.29 USD 634.33\nββββββββββββββββββββββββββββββββββββββββββββββββββ\nSubtotal: USD 25655.93\nTax (10.0%): USD 2565.59\nTOTAL: USD 28221.52\n\nPayment Terms: Due on Receipt\nDue Date: 2024-07-24\n\nThank you for your business!\n\nReferenced Purchase Order: PO-2022-93346\n Air Filter Cartridge (DUPLICATE ENTRY)\n---"}, {"role": "assistant", "content": "{\"common\":{\"document_type\":\"invoice\",\"date\":\"2024-05-25\",\"issuer\":{\"name\":\"Jones, Hensley and Powell\",\"address\":\"USNS Riley, FPO AE 81684\"},\"recipient\":{\"name\":\"Barnes PLC\",\"address\":\"3534 Anderson Rue Suite 751, South Christineshire, CT 10103\"},\"total_amount\":28221.52,\"currency\":\"USD\"},\"line_items\":[{\"description\":\"Air Filter Cartridge\",\"quantity\":446,\"unit_price\":13.41,\"amount\":5980.86},{\"description\":\"Welding Rod E6013 (5lb)\",\"quantity\":431,\"unit_price\":16.22,\"amount\":6990.82},{\"description\":\"Copper Wire 14AWG (100ft)\",\"quantity\":312,\"unit_price\":29.92,\"amount\":9335.04},{\"description\":\"Nitrile Gloves Box (100)\",\"quantity\":192,\"unit_price\":14.14,\"amount\":2714.88},{\"description\":\"Rubber O-Ring Seals\",\"quantity\":277,\"unit_price\":2.29,\"amount\":634.33},{\"description\":\"Air Filter Cartridge\",\"quantity\":446,\"unit_price\":13.41,\"amount\":5980.86}],\"type_specific\":{\"invoice_number\":\"INV-2023-3591\",\"due_date\":\"2024-07-24\",\"payment_terms\":\"Due on Receipt\",\"tax_amount\":2565.59,\"subtotal\":25655.93},\"flags\":[{\"category\":\"cross_field\",\"field\":\"referenced_po\",\"severity\":\"high\",\"description\":\"Invoice references Purchase Order 'PO-2022-93346' which could not be verified in the system.\"},{\"category\":\"format_anomaly\",\"field\":\"line_items\",\"severity\":\"low\",\"description\":\"Duplicate line item detected: 'Air Filter Cartridge' appears more than once.\"}],\"confidence_score\":0.95}"}]} | |
| {"messages": [{"role": "system", "content": "You are a financial document extraction expert. Your task is to:\n\n1. Identify the document type (invoice, purchase_order, receipt, or bank_statement).\n2. Extract all relevant fields into a structured JSON object following this exact schema:\n - \"common\": document_type, date, issuer (name, address), recipient (name, address), total_amount, currency\n - \"line_items\": array of {description, quantity, unit_price, amount}\n - \"type_specific\": fields specific to the document type\n - \"flags\": array of detected anomalies, each with {category, field, severity, description}\n - \"confidence_score\": your confidence in the extraction (0.0 to 1.0)\n\n3. Analyze the document for anomalies across these categories:\n - arithmetic_error: Mathematical calculations that don't add up\n - missing_field: Required fields that are absent from the document\n - format_anomaly: Inconsistent formats, negative quantities, duplicate entries\n - business_logic: Unusual amounts, suspicious patterns, round-number fraud indicators\n - cross_field: Mismatched references between related fields or documents\n\n4. If no anomalies are found, return an empty \"flags\" array.\n\nOutput ONLY valid JSON. No explanations, no markdown, no code blocks β just the raw JSON object."}, {"role": "user", "content": "Extract structured data from this financial document:\n\n---\nPURCHASE ORDER\n\nPO Number: PO-2022-53007\nDate: 2025-02-03\n\nBuyer: Skinner-James\nAddress: 6924 Morrison Square Apt. 752, Novakport, WA 91590\n\nSupplier: Henry, Johnston and Sanchez\nAddress: 1939 Erin Plaza Apt. 737, Marieland, GU 54128\n\nItems Ordered:\nββββββββββββββββββββββββββββββββββββββββββββββββββ\n 1. Steel Bolts M10x50 - Qty: 383 @ EUR 3.92 = EUR 1501.36\n 2. Hydraulic Cylinder 3in Bore - Qty: 67 @ EUR 174.35 = EUR 11681.45\nββββββββββββββββββββββββββββββββββββββββββββββββββ\nTotal: EUR 13182.81\n\nDelivery Date: 2025-03-27\nShip To: 434 Jennifer Turnpike, Jonesfort, OR 53035\n\nAuthorized By: Tammy Kane\n\nNote: Payment in CAD\n---"}, {"role": "assistant", "content": "{\"common\":{\"document_type\":\"purchase_order\",\"date\":\"2025-02-03\",\"issuer\":{\"name\":\"Skinner-James\",\"address\":\"6924 Morrison Square Apt. 752, Novakport, WA 91590\"},\"recipient\":{\"name\":\"Henry, Johnston and Sanchez\",\"address\":\"1939 Erin Plaza Apt. 737, Marieland, GU 54128\"},\"total_amount\":13182.81,\"currency\":\"EUR\"},\"line_items\":[{\"description\":\"Steel Bolts M10x50\",\"quantity\":383,\"unit_price\":3.92,\"amount\":1501.36},{\"description\":\"Hydraulic Cylinder 3in Bore\",\"quantity\":67,\"unit_price\":174.35,\"amount\":11681.45}],\"type_specific\":{\"po_number\":\"PO-2022-53007\",\"delivery_date\":null,\"shipping_address\":\"434 Jennifer Turnpike, Jonesfort, OR 53035\",\"referenced_invoice\":null},\"flags\":[{\"category\":\"missing_field\",\"field\":\"type_specific.delivery_date\",\"severity\":\"medium\",\"description\":\"Required field 'Delivery Date' is missing from the document.\"},{\"category\":\"cross_field\",\"field\":\"currency\",\"severity\":\"high\",\"description\":\"Header shows currency as EUR but payment note references CAD. Currency mismatch detected.\"}],\"confidence_score\":0.95}"}]} | |
| {"messages": [{"role": "system", "content": "You are a financial document extraction expert. Your task is to:\n\n1. Identify the document type (invoice, purchase_order, receipt, or bank_statement).\n2. Extract all relevant fields into a structured JSON object following this exact schema:\n - \"common\": document_type, date, issuer (name, address), recipient (name, address), total_amount, currency\n - \"line_items\": array of {description, quantity, unit_price, amount}\n - \"type_specific\": fields specific to the document type\n - \"flags\": array of detected anomalies, each with {category, field, severity, description}\n - \"confidence_score\": your confidence in the extraction (0.0 to 1.0)\n\n3. Analyze the document for anomalies across these categories:\n - arithmetic_error: Mathematical calculations that don't add up\n - missing_field: Required fields that are absent from the document\n - format_anomaly: Inconsistent formats, negative quantities, duplicate entries\n - business_logic: Unusual amounts, suspicious patterns, round-number fraud indicators\n - cross_field: Mismatched references between related fields or documents\n\n4. If no anomalies are found, return an empty \"flags\" array.\n\nOutput ONLY valid JSON. No explanations, no markdown, no code blocks β just the raw JSON object."}, {"role": "user", "content": "Extract structured data from this financial document:\n\n---\n==================================================\nINVOICE\n==================================================\n\nInvoice Number: INV-2025-2771\nDate: 01/13/2025\n\nFrom:\n Tran-Cooper\n Unit 8557 Box 4443, DPO AE 95041\n\nBill To:\n Stewart Ltd\n 37498 Hooper Village, West Jeremy, HI 40119\n\nββββββββββββββββββββββββββββββββββββββββββββββββββ\nItem Qty Price Amount\nββββββββββββββββββββββββββββββββββββββββββββββββββ\nPVC Pipe 2in x 10ft 241 CAD 5.71 CAD 1376.11\nββββββββββββββββββββββββββββββββββββββββββββββββββ\nSubtotal: CAD 1376.11\nTax (9.0%): CAD 123.85\nTOTAL: CAD 1499.96\n\nPayment Terms: Net 90\nDue Date: 2025-02-12\n\nThank you for your business!\n\nReferenced Purchase Order: PO-2020-91876\n---"}, {"role": "assistant", "content": "{\"common\":{\"document_type\":\"invoice\",\"date\":\"2025-01-13\",\"issuer\":{\"name\":\"Tran-Cooper\",\"address\":\"Unit 8557 Box 4443, DPO AE 95041\"},\"recipient\":{\"name\":\"Stewart Ltd\",\"address\":\"37498 Hooper Village, West Jeremy, HI 40119\"},\"total_amount\":1499.96,\"currency\":\"CAD\"},\"line_items\":[{\"description\":\"PVC Pipe 2in x 10ft\",\"quantity\":241,\"unit_price\":5.71,\"amount\":1376.11}],\"type_specific\":{\"invoice_number\":\"INV-2025-2771\",\"due_date\":\"2025-02-12\",\"payment_terms\":\"Net 90\",\"tax_amount\":123.85,\"subtotal\":1376.11},\"flags\":[{\"category\":\"cross_field\",\"field\":\"referenced_po\",\"severity\":\"high\",\"description\":\"Invoice references Purchase Order 'PO-2020-91876' which could not be verified in the system.\"}],\"confidence_score\":0.95}"}]} | |
| {"messages": [{"role": "system", "content": "You are a financial document extraction expert. Your task is to:\n\n1. Identify the document type (invoice, purchase_order, receipt, or bank_statement).\n2. Extract all relevant fields into a structured JSON object following this exact schema:\n - \"common\": document_type, date, issuer (name, address), recipient (name, address), total_amount, currency\n - \"line_items\": array of {description, quantity, unit_price, amount}\n - \"type_specific\": fields specific to the document type\n - \"flags\": array of detected anomalies, each with {category, field, severity, description}\n - \"confidence_score\": your confidence in the extraction (0.0 to 1.0)\n\n3. Analyze the document for anomalies across these categories:\n - arithmetic_error: Mathematical calculations that don't add up\n - missing_field: Required fields that are absent from the document\n - format_anomaly: Inconsistent formats, negative quantities, duplicate entries\n - business_logic: Unusual amounts, suspicious patterns, round-number fraud indicators\n - cross_field: Mismatched references between related fields or documents\n\n4. If no anomalies are found, return an empty \"flags\" array.\n\nOutput ONLY valid JSON. No explanations, no markdown, no code blocks β just the raw JSON object."}, {"role": "user", "content": "Extract structured data from this financial document:\n\n---\n==================================================\nINVOICE\n==================================================\n\nInvoice Number: INV-2025-7594\nDate: August 29, 2024\n\nFrom:\n Cooke and Sons\n 880 Walter River, New Nicholas, GA 24204\n\nBill To:\n Wise, Lamb and Martin\n 3794 Young Garden Suite 735, Murrayport, WI 50553\n\nββββββββββββββββββββββββββββββββββββββββββββββββββ\nItem Qty Price Amount\nββββββββββββββββββββββββββββββββββββββββββββββββββ\nHydraulic Cylinder 3in Bore 47 USD 138.80 USD 6523.60\nRubber O-Ring Seals 346 USD 2.03 USD 702.38\nThermal Paste 30g 295 USD 6.69 USD 1973.55\nSteel Bolts M10x50 11 USD 8.50 USD 93.50\nStainless Steel Sheet 4x8ft 345 USD 163.28 USD 56331.60\nSafety Goggles ANSI Z87 296 USD 3.88 USD 1148.48\nLED Panel Light 60W 388 USD 17.78 USD 6898.64\nBall Bearing 6205-2RS 266 USD 8.47 USD 2253.02\nββββββββββββββββββββββββββββββββββββββββββββββββββ\nSubtotal: USD 75924.77\nTax (7.5%): USD 5694.36\nTOTAL: USD 81619.13\n\nPayment Terms: 2/10 Net 30\nDue Date: 2024-10-28\n\nThank you for your business!\n---"}, {"role": "assistant", "content": "{\"common\":{\"document_type\":\"invoice\",\"date\":\"2024-08-29\",\"issuer\":{\"name\":\"Cooke and Sons\",\"address\":\"880 Walter River, New Nicholas, GA 24204\"},\"recipient\":{\"name\":\"Wise, Lamb and Martin\",\"address\":\"3794 Young Garden Suite 735, Murrayport, WI 50553\"},\"total_amount\":81619.13,\"currency\":\"USD\"},\"line_items\":[{\"description\":\"Hydraulic Cylinder 3in Bore\",\"quantity\":47,\"unit_price\":138.8,\"amount\":6523.6},{\"description\":\"Rubber O-Ring Seals\",\"quantity\":346,\"unit_price\":2.03,\"amount\":702.38},{\"description\":\"Thermal Paste 30g\",\"quantity\":295,\"unit_price\":6.69,\"amount\":1973.55},{\"description\":\"Steel Bolts M10x50\",\"quantity\":11,\"unit_price\":8.5,\"amount\":93.5},{\"description\":\"Stainless Steel Sheet 4x8ft\",\"quantity\":345,\"unit_price\":163.28,\"amount\":56331.6},{\"description\":\"Safety Goggles ANSI Z87\",\"quantity\":296,\"unit_price\":3.88,\"amount\":1148.48},{\"description\":\"LED Panel Light 60W\",\"quantity\":388,\"unit_price\":17.78,\"amount\":6898.64},{\"description\":\"Ball Bearing 6205-2RS\",\"quantity\":266,\"unit_price\":8.47,\"amount\":2253.02}],\"type_specific\":{\"invoice_number\":\"INV-2025-7594\",\"due_date\":\"2024-10-28\",\"payment_terms\":\"2/10 Net 30\",\"tax_amount\":5694.36,\"subtotal\":75924.77},\"flags\":[],\"confidence_score\":0.95}"}]} | |
| {"messages": [{"role": "system", "content": "You are a financial document extraction expert. Your task is to:\n\n1. Identify the document type (invoice, purchase_order, receipt, or bank_statement).\n2. Extract all relevant fields into a structured JSON object following this exact schema:\n - \"common\": document_type, date, issuer (name, address), recipient (name, address), total_amount, currency\n - \"line_items\": array of {description, quantity, unit_price, amount}\n - \"type_specific\": fields specific to the document type\n - \"flags\": array of detected anomalies, each with {category, field, severity, description}\n - \"confidence_score\": your confidence in the extraction (0.0 to 1.0)\n\n3. Analyze the document for anomalies across these categories:\n - arithmetic_error: Mathematical calculations that don't add up\n - missing_field: Required fields that are absent from the document\n - format_anomaly: Inconsistent formats, negative quantities, duplicate entries\n - business_logic: Unusual amounts, suspicious patterns, round-number fraud indicators\n - cross_field: Mismatched references between related fields or documents\n\n4. If no anomalies are found, return an empty \"flags\" array.\n\nOutput ONLY valid JSON. No explanations, no markdown, no code blocks β just the raw JSON object."}, {"role": "user", "content": "Extract structured data from this financial document:\n\n---\n\nRECEIPT\n----------------------------------------\n Jamur Crispy x2 27.00\n Nasi Putih x2 14.00\n Sambel Kecap x2 9.00\n Es Teh x2 15.00\n----------------------------------------\n TOTAL: 73.45\n---"}, {"role": "assistant", "content": "{\"common\":{\"document_type\":\"receipt\",\"date\":null,\"issuer\":{\"name\":null,\"address\":null},\"recipient\":null,\"total_amount\":73.45,\"currency\":\"USD\"},\"line_items\":[{\"description\":\"Jamur Crispy\",\"quantity\":2.0,\"unit_price\":13.5,\"amount\":27.0},{\"description\":\"Nasi Putih\",\"quantity\":2.0,\"unit_price\":7.0,\"amount\":14.0},{\"description\":\"Sambel Kecap\",\"quantity\":2.0,\"unit_price\":4.5,\"amount\":9.0},{\"description\":\"Es Teh\",\"quantity\":2.0,\"unit_price\":7.5,\"amount\":15.0}],\"type_specific\":{\"receipt_number\":\"RCP-0686D778\",\"payment_method\":\"unknown\",\"store_location\":null,\"cashier\":null,\"subtotal\":null,\"tax_amount\":null},\"flags\":[],\"confidence_score\":0.88}"}]} | |
| {"messages": [{"role": "system", "content": "You are a financial document extraction expert. Your task is to:\n\n1. Identify the document type (invoice, purchase_order, receipt, or bank_statement).\n2. Extract all relevant fields into a structured JSON object following this exact schema:\n - \"common\": document_type, date, issuer (name, address), recipient (name, address), total_amount, currency\n - \"line_items\": array of {description, quantity, unit_price, amount}\n - \"type_specific\": fields specific to the document type\n - \"flags\": array of detected anomalies, each with {category, field, severity, description}\n - \"confidence_score\": your confidence in the extraction (0.0 to 1.0)\n\n3. Analyze the document for anomalies across these categories:\n - arithmetic_error: Mathematical calculations that don't add up\n - missing_field: Required fields that are absent from the document\n - format_anomaly: Inconsistent formats, negative quantities, duplicate entries\n - business_logic: Unusual amounts, suspicious patterns, round-number fraud indicators\n - cross_field: Mismatched references between related fields or documents\n\n4. If no anomalies are found, return an empty \"flags\" array.\n\nOutput ONLY valid JSON. No explanations, no markdown, no code blocks β just the raw JSON object."}, {"role": "user", "content": "Extract structured data from this financial document:\n\n---\n============================================================\n Cornerstone Banking Group\n ACCOUNT STATEMENT\n============================================================\n\nAccount Holder: Cynthia Le\nAccount Number: ****6890\nStatement Period: 04/02/2026 - 05/02/2026\n\nOpening Balance: $44,879.57\n\nDate Description Amount Balance\nββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ\n2026-04-03 Deposit $274.72 $55,430.27\n2026-04-06 Payroll $-53-1.70 $46,287.44\n2026-04-09 Wire Transfer In $1,705.07 $47,722.87\n2026-04-12 Client Payment $8,010.50 $63,440.77\n2026-04-15 POS Purchase - Conrad and Sons $-1,117.79 $41,776.15\n2026-04-15 Deposit $1,204.89 $47,492.33\n2026-04-16 Refund - Miller-Lewis $6,919.85 $46,819.14\n2026-04-16 Refund - Kirk-Gutierrez $4,855.11 $55,500.75\n2026-04-17 Utility Payment - Electric $-345.20 $55,155.55\n2026-04-21 Deposit $1,138.23 $46,017.80\n2026-04-21 Utility Payment - Electric $-4,828.93 $42,893.94\n2026-04-25 Payroll $-1,876.86 $39,899.29\n2026-05-02 Wire Transfer In $3,153.31 $50,645.64\nββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ\n\nClosing Balance: $50,645.64\n\nTotal Debits: $-8,700.48\nTotal Credits: $27,261.68\n---"}, {"role": "assistant", "content": "{\"common\":{\"document_type\":\"bank_statement\",\"date\":\"2026-05-02\",\"issuer\":{\"name\":\"Cornerstone Banking Group\",\"address\":null},\"recipient\":{\"name\":\"Cynthia Le\",\"address\":null},\"total_amount\":50645.64,\"currency\":\"USD\"},\"line_items\":[{\"description\":\"Deposit\",\"quantity\":1,\"unit_price\":274.72,\"amount\":274.72},{\"description\":\"Payroll\",\"quantity\":1,\"unit_price\":-531.7,\"amount\":-531.7},{\"description\":\"Wire Transfer In\",\"quantity\":1,\"unit_price\":1705.07,\"amount\":1705.07},{\"description\":\"Client Payment\",\"quantity\":1,\"unit_price\":8010.5,\"amount\":8010.5},{\"description\":\"POS Purchase - Conrad and Sons\",\"quantity\":1,\"unit_price\":-1117.79,\"amount\":-1117.79},{\"description\":\"Deposit\",\"quantity\":-1,\"unit_price\":1204.89,\"amount\":1204.89},{\"description\":\"Refund - Miller-Lewis\",\"quantity\":1,\"unit_price\":6919.85,\"amount\":6919.85},{\"description\":\"Refund - Kirk-Gutierrez\",\"quantity\":1,\"unit_price\":4855.11,\"amount\":4855.11},{\"description\":\"Utility Payment - Electric\",\"quantity\":1,\"unit_price\":-345.2,\"amount\":-345.2},{\"description\":\"Deposit\",\"quantity\":1,\"unit_price\":1138.23,\"amount\":1138.23},{\"description\":\"Utility Payment - Electric\",\"quantity\":1,\"unit_price\":-4828.93,\"amount\":-4828.93},{\"description\":\"Payroll\",\"quantity\":1,\"unit_price\":-1876.86,\"amount\":-1876.86},{\"description\":\"Wire Transfer In\",\"quantity\":1,\"unit_price\":3153.31,\"amount\":3153.31}],\"type_specific\":{\"account_number\":null,\"statement_period\":null,\"opening_balance\":44879.57,\"closing_balance\":50645.64},\"flags\":[{\"category\":\"missing_field\",\"field\":\"type_specific.account_number\",\"severity\":\"medium\",\"description\":\"Required field 'Account Number' is missing from the document.\"},{\"category\":\"missing_field\",\"field\":\"type_specific.statement_period\",\"severity\":\"medium\",\"description\":\"Required field 'Statement Period' is missing from the document.\"},{\"category\":\"format_anomaly\",\"field\":\"line_items[5].quantity\",\"severity\":\"medium\",\"description\":\"Line item 'Deposit' has a negative quantity (-1), which is unusual for a standard document.\"}],\"confidence_score\":0.95}"}]} | |
| {"messages": [{"role": "system", "content": "You are a financial document extraction expert. Your task is to:\n\n1. Identify the document type (invoice, purchase_order, receipt, or bank_statement).\n2. Extract all relevant fields into a structured JSON object following this exact schema:\n - \"common\": document_type, date, issuer (name, address), recipient (name, address), total_amount, currency\n - \"line_items\": array of {description, quantity, unit_price, amount}\n - \"type_specific\": fields specific to the document type\n - \"flags\": array of detected anomalies, each with {category, field, severity, description}\n - \"confidence_score\": your confidence in the extraction (0.0 to 1.0)\n\n3. Analyze the document for anomalies across these categories:\n - arithmetic_error: Mathematical calculations that don't add up\n - missing_field: Required fields that are absent from the document\n - format_anomaly: Inconsistent formats, negative quantities, duplicate entries\n - business_logic: Unusual amounts, suspicious patterns, round-number fraud indicators\n - cross_field: Mismatched references between related fields or documents\n\n4. If no anomalies are found, return an empty \"flags\" array.\n\nOutput ONLY valid JSON. No explanations, no markdown, no code blocks β just the raw JSON object."}, {"role": "user", "content": "Extract structured data from this financial document:\n\n---\n==================================================\nINVOICE\n==================================================\n\nInvoice Number: INV-2021-2480\nDate: June 16, 2025\n\nFrom:\n Jacobs Ltd\n 7192 Sanders Run Suite 310, South April, PW 74120\n\nBill To:\n Green LLC\n 473 Derek Cliffs, Thompsonmouth, CO 89882\n\nββββββββββββββββββββββββββββββββββββββββββββββββββ\nItem Qty Price Amount\nββββββββββββββββββββββββββββββββββββββββββββββββββ\nWelding Rod E6013 (5lb) 361 USD 19.96 USD 7205.56\nCable Ties 12in (100pk) 228 USD 2.66 USD 606.48\nββββββββββββββββββββββββββββββββββββββββββββββββββ\nSubtotal: USD 7812.04\nTax (7.5%): USD 585.90\nTOTAL: USD 8397.94\n\nPayment Terms: Net 15\nDue Date: 2025-07-31\n\nThank you for your business!\n Welding Rod E6013 (5lb) (DUPLICATE ENTRY)\n---"}, {"role": "assistant", "content": "{\"common\":{\"document_type\":\"invoice\",\"date\":\"2025-06-16\",\"issuer\":{\"name\":\"Jacobs Ltd\",\"address\":\"7192 Sanders Run Suite 310, South April, PW 74120\"},\"recipient\":{\"name\":\"Green LLC\",\"address\":\"473 Derek Cliffs, Thompsonmouth, CO 89882\"},\"total_amount\":8397.94,\"currency\":\"USD\"},\"line_items\":[{\"description\":\"Welding Rod E6013 (5lb)\",\"quantity\":361,\"unit_price\":19.96,\"amount\":7205.56},{\"description\":\"Cable Ties 12in (100pk)\",\"quantity\":228,\"unit_price\":2.66,\"amount\":606.48},{\"description\":\"Welding Rod E6013 (5lb)\",\"quantity\":361,\"unit_price\":19.96,\"amount\":7205.56}],\"type_specific\":{\"invoice_number\":\"INV-2021-2480\",\"due_date\":\"2025-07-31\",\"payment_terms\":\"Net 15\",\"tax_amount\":585.9,\"subtotal\":7812.04},\"flags\":[{\"category\":\"format_anomaly\",\"field\":\"line_items\",\"severity\":\"low\",\"description\":\"Duplicate line item detected: 'Welding Rod E6013 (5lb)' appears more than once.\"}],\"confidence_score\":0.95}"}]} | |
| {"messages": [{"role": "system", "content": "You are a financial document extraction expert. Your task is to:\n\n1. Identify the document type (invoice, purchase_order, receipt, or bank_statement).\n2. Extract all relevant fields into a structured JSON object following this exact schema:\n - \"common\": document_type, date, issuer (name, address), recipient (name, address), total_amount, currency\n - \"line_items\": array of {description, quantity, unit_price, amount}\n - \"type_specific\": fields specific to the document type\n - \"flags\": array of detected anomalies, each with {category, field, severity, description}\n - \"confidence_score\": your confidence in the extraction (0.0 to 1.0)\n\n3. Analyze the document for anomalies across these categories:\n - arithmetic_error: Mathematical calculations that don't add up\n - missing_field: Required fields that are absent from the document\n - format_anomaly: Inconsistent formats, negative quantities, duplicate entries\n - business_logic: Unusual amounts, suspicious patterns, round-number fraud indicators\n - cross_field: Mismatched references between related fields or documents\n\n4. If no anomalies are found, return an empty \"flags\" array.\n\nOutput ONLY valid JSON. No explanations, no markdown, no code blocks β just the raw JSON object."}, {"role": "user", "content": "Extract structured data from this financial document:\n\n---\n\nRECEIPT\n----------------------------------------\n Coffee Rocksalt [R] x1 28000.00\n ICED NUTELLA LATTE [R] x1 30000.00\n ICED MOCHA LATTE [R] x1 30000.00\n----------------------------------------\n TOTAL: 100000.00\n---"}, {"role": "assistant", "content": "{\"common\":{\"document_type\":\"receipt\",\"date\":null,\"issuer\":{\"name\":null,\"address\":null},\"recipient\":null,\"total_amount\":100000.0,\"currency\":\"USD\"},\"line_items\":[{\"description\":\"Coffee Rocksalt [R]\",\"quantity\":1.0,\"unit_price\":28000.0,\"amount\":28000.0},{\"description\":\"ICED NUTELLA LATTE [R]\",\"quantity\":1.0,\"unit_price\":30000.0,\"amount\":30000.0},{\"description\":\"ICED MOCHA LATTE [R]\",\"quantity\":1.0,\"unit_price\":30000.0,\"amount\":30000.0}],\"type_specific\":{\"receipt_number\":\"RCP-9BC13E89\",\"payment_method\":\"unknown\",\"store_location\":null,\"cashier\":null,\"subtotal\":null,\"tax_amount\":null},\"flags\":[],\"confidence_score\":0.88}"}]} | |
| {"messages": [{"role": "system", "content": "You are a financial document extraction expert. Your task is to:\n\n1. Identify the document type (invoice, purchase_order, receipt, or bank_statement).\n2. Extract all relevant fields into a structured JSON object following this exact schema:\n - \"common\": document_type, date, issuer (name, address), recipient (name, address), total_amount, currency\n - \"line_items\": array of {description, quantity, unit_price, amount}\n - \"type_specific\": fields specific to the document type\n - \"flags\": array of detected anomalies, each with {category, field, severity, description}\n - \"confidence_score\": your confidence in the extraction (0.0 to 1.0)\n\n3. Analyze the document for anomalies across these categories:\n - arithmetic_error: Mathematical calculations that don't add up\n - missing_field: Required fields that are absent from the document\n - format_anomaly: Inconsistent formats, negative quantities, duplicate entries\n - business_logic: Unusual amounts, suspicious patterns, round-number fraud indicators\n - cross_field: Mismatched references between related fields or documents\n\n4. If no anomalies are found, return an empty \"flags\" array.\n\nOutput ONLY valid JSON. No explanations, no markdown, no code blocks β just the raw JSON object."}, {"role": "user", "content": "Extract structured data from this financial document:\n\n---\n***********************************\n Perry's Hardware\n Lake Martin, MN\n***********************************\n\nDate: 10/21/2025 Time: 12:40\nReceipt #: R-735710\nCashier: Christine\n\n Finally Pair -6x $14.67 $88.02\n Discover Pair 5x $48.30 $241.50\n Soldier Pair 9x $43.51 $391.59\n\n Subtotal: $721.11\n Tax (10.0%): $72.11\n ββββββββββββββββββββββββββββββ\n TOTAL: $1453526.67\n\n Paid by: Wire Transfer\n\n Thank you for shopping with us!\n***********************************\n---"}, {"role": "assistant", "content": "{\"common\":{\"document_type\":\"receipt\",\"date\":\"2025-10-21\",\"issuer\":{\"name\":\"Perry's Hardware\",\"address\":\"Lake Martin, MN\"},\"recipient\":null,\"total_amount\":1453526.67,\"currency\":\"USD\"},\"line_items\":[{\"description\":\"Finally Pair\",\"quantity\":-6,\"unit_price\":14.67,\"amount\":88.02},{\"description\":\"Discover Pair\",\"quantity\":5,\"unit_price\":48.3,\"amount\":241.5},{\"description\":\"Soldier Pair\",\"quantity\":9,\"unit_price\":43.51,\"amount\":391.59}],\"type_specific\":{\"receipt_number\":\"R-735710\",\"payment_method\":\"Wire Transfer\",\"store_location\":\"Lake Martin, MN\",\"cashier\":\"Christine\"},\"flags\":[{\"category\":\"business_logic\",\"field\":\"common.total_amount\",\"severity\":\"high\",\"description\":\"Total amount $1,453,526.67 is unusually large and may require additional review.\"},{\"category\":\"format_anomaly\",\"field\":\"line_items[0].quantity\",\"severity\":\"medium\",\"description\":\"Line item 'Finally Pair' has a negative quantity (-6), which is unusual for a standard document.\"}],\"confidence_score\":0.95}"}]} | |
| {"messages": [{"role": "system", "content": "You are a financial document extraction expert. Your task is to:\n\n1. Identify the document type (invoice, purchase_order, receipt, or bank_statement).\n2. Extract all relevant fields into a structured JSON object following this exact schema:\n - \"common\": document_type, date, issuer (name, address), recipient (name, address), total_amount, currency\n - \"line_items\": array of {description, quantity, unit_price, amount}\n - \"type_specific\": fields specific to the document type\n - \"flags\": array of detected anomalies, each with {category, field, severity, description}\n - \"confidence_score\": your confidence in the extraction (0.0 to 1.0)\n\n3. Analyze the document for anomalies across these categories:\n - arithmetic_error: Mathematical calculations that don't add up\n - missing_field: Required fields that are absent from the document\n - format_anomaly: Inconsistent formats, negative quantities, duplicate entries\n - business_logic: Unusual amounts, suspicious patterns, round-number fraud indicators\n - cross_field: Mismatched references between related fields or documents\n\n4. If no anomalies are found, return an empty \"flags\" array.\n\nOutput ONLY valid JSON. No explanations, no markdown, no code blocks β just the raw JSON object."}, {"role": "user", "content": "Extract structured data from this financial document:\n\n---\nPURCHASE ORDER\n\nPO Number: PO-2021-18812\nDate: 2024-11-22\n\nBuyer: Meyer, Hanson and Silva\nAddress: 994 Cohen Orchard Apt. 545, Joseport, ID 20152\n\nSupplier: Cannon, Yoder and Trujillo\nAddress: 520 Bright Green Suite 660, West Jessica, IN 90458\n\nItems Ordered:\nββββββββββββββββββββββββββββββββββββββββββββββββββ\n 1. Welding Rod E6013 (5lb) - Qty: 197 @ USD 21.19 = USD 4174.43\n 2. Cable Ties 12in (100pk) - Qty: 17 @ USD 5.73 = USD 97.41\n 3. Safety Goggles ANSI Z87 - Qty: 348 @ USD 7.77 = USD 2703.96\n 4. Nitrile Gloves Box (100) - Qty: 190 @ USD 9.46 = USD 1797.40\nββββββββββββββββββββββββββββββββββββββββββββββββββ\nTotal: USD 8773.20\n\nDelivery Date: 2025-01-16\nShip To: 584 Mary Wells Apt. 417, Lake Jonathan, AS 73774\n\nAuthorized By: Catherine Frazier\n\nNote: Payment in AUD\n---"}, {"role": "assistant", "content": "{\"common\":{\"document_type\":\"purchase_order\",\"date\":\"2024-11-22\",\"issuer\":{\"name\":\"Meyer, Hanson and Silva\",\"address\":\"994 Cohen Orchard Apt. 545, Joseport, ID 20152\"},\"recipient\":{\"name\":\"Cannon, Yoder and Trujillo\",\"address\":\"520 Bright Green Suite 660, West Jessica, IN 90458\"},\"total_amount\":8773.2,\"currency\":\"USD\"},\"line_items\":[{\"description\":\"Welding Rod E6013 (5lb)\",\"quantity\":197,\"unit_price\":21.19,\"amount\":4174.43},{\"description\":\"Cable Ties 12in (100pk)\",\"quantity\":17,\"unit_price\":5.73,\"amount\":97.41},{\"description\":\"Safety Goggles ANSI Z87\",\"quantity\":348,\"unit_price\":7.77,\"amount\":2703.96},{\"description\":\"Nitrile Gloves Box (100)\",\"quantity\":190,\"unit_price\":9.46,\"amount\":1797.4}],\"type_specific\":{\"po_number\":\"PO-2021-18812\",\"delivery_date\":\"2025-01-16\",\"shipping_address\":\"584 Mary Wells Apt. 417, Lake Jonathan, AS 73774\",\"referenced_invoice\":null},\"flags\":[{\"category\":\"cross_field\",\"field\":\"currency\",\"severity\":\"high\",\"description\":\"Header shows currency as USD but payment note references AUD. Currency mismatch detected.\"}],\"confidence_score\":0.95}"}]} | |
| {"messages": [{"role": "system", "content": "You are a financial document extraction expert. Your task is to:\n\n1. Identify the document type (invoice, purchase_order, receipt, or bank_statement).\n2. Extract all relevant fields into a structured JSON object following this exact schema:\n - \"common\": document_type, date, issuer (name, address), recipient (name, address), total_amount, currency\n - \"line_items\": array of {description, quantity, unit_price, amount}\n - \"type_specific\": fields specific to the document type\n - \"flags\": array of detected anomalies, each with {category, field, severity, description}\n - \"confidence_score\": your confidence in the extraction (0.0 to 1.0)\n\n3. Analyze the document for anomalies across these categories:\n - arithmetic_error: Mathematical calculations that don't add up\n - missing_field: Required fields that are absent from the document\n - format_anomaly: Inconsistent formats, negative quantities, duplicate entries\n - business_logic: Unusual amounts, suspicious patterns, round-number fraud indicators\n - cross_field: Mismatched references between related fields or documents\n\n4. If no anomalies are found, return an empty \"flags\" array.\n\nOutput ONLY valid JSON. No explanations, no markdown, no code blocks β just the raw JSON object."}, {"role": "user", "content": "Extract structured data from this financial document:\n\n---\n============================================================\n Horizon Financial Corp\n ACCOUNT STATEMENT\n============================================================\n\nAccount Holder: Julie Taylor\nAccount Number: ****4300\nStatement Period: 04/02/2026 - 05/02/2026\n\nOpening Balance: $7,393.99\n\nDate Description Amount Balance\nββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ\n2026-04-06 ACH Debit - Phillips LLC $-4,233.94 $3,802.63\n2026-04-06 Client Payment $9,489.50 $8,250.68\n2026-04-07 Refund - Harrison PLC $6,950.93 $21,710.95\n2026-04-23 Utility Payment - Electric $-1,515.81 $-1,238.82\n2026-04-27 Interest Payment $6,509.34 $14,760.02\n2026-04-29 Utility Payment - Electric $-3,525.64 $276.99\n2026-05-01 Wire Transfer In $642.58 $8,036.57\nββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ\n\nClosing Balance: $8,036.57\n\nTotal Debits: $-9,275.39\nTotal Credits: $23,592.35\n---"}, {"role": "assistant", "content": "{\"common\":{\"document_type\":\"bank_statement\",\"date\":\"2026-05-02\",\"issuer\":{\"name\":\"Horizon Financial Corp\",\"address\":null},\"recipient\":{\"name\":\"Julie Taylor\",\"address\":null},\"total_amount\":8036.57,\"currency\":\"USD\"},\"line_items\":[{\"description\":\"ACH Debit - Phillips LLC\",\"quantity\":1,\"unit_price\":-4233.94,\"amount\":-4233.94},{\"description\":\"Client Payment\",\"quantity\":1,\"unit_price\":9489.5,\"amount\":9489.5},{\"description\":\"Refund - Harrison PLC\",\"quantity\":1,\"unit_price\":6950.93,\"amount\":6950.93},{\"description\":\"Utility Payment - Electric\",\"quantity\":1,\"unit_price\":-1515.81,\"amount\":-1515.81},{\"description\":\"Interest Payment\",\"quantity\":1,\"unit_price\":6509.34,\"amount\":6509.34},{\"description\":\"Utility Payment - Electric\",\"quantity\":1,\"unit_price\":-3525.64,\"amount\":-3525.64},{\"description\":\"Wire Transfer In\",\"quantity\":1,\"unit_price\":642.58,\"amount\":642.58}],\"type_specific\":{\"account_number\":\"****4300\",\"statement_period\":\"2026-04-02 to 2026-05-02\",\"opening_balance\":7393.99,\"closing_balance\":8036.57},\"flags\":[],\"confidence_score\":0.95}"}]} | |
| {"messages": [{"role": "system", "content": "You are a financial document extraction expert. Your task is to:\n\n1. Identify the document type (invoice, purchase_order, receipt, or bank_statement).\n2. Extract all relevant fields into a structured JSON object following this exact schema:\n - \"common\": document_type, date, issuer (name, address), recipient (name, address), total_amount, currency\n - \"line_items\": array of {description, quantity, unit_price, amount}\n - \"type_specific\": fields specific to the document type\n - \"flags\": array of detected anomalies, each with {category, field, severity, description}\n - \"confidence_score\": your confidence in the extraction (0.0 to 1.0)\n\n3. Analyze the document for anomalies across these categories:\n - arithmetic_error: Mathematical calculations that don't add up\n - missing_field: Required fields that are absent from the document\n - format_anomaly: Inconsistent formats, negative quantities, duplicate entries\n - business_logic: Unusual amounts, suspicious patterns, round-number fraud indicators\n - cross_field: Mismatched references between related fields or documents\n\n4. If no anomalies are found, return an empty \"flags\" array.\n\nOutput ONLY valid JSON. No explanations, no markdown, no code blocks β just the raw JSON object."}, {"role": "user", "content": "Extract structured data from this financial document:\n\n---\n============================================================\n Pacific Commerce Bank\n ACCOUNT STATEMENT\n============================================================\n\nAccount Holder: Abigail Andrade\nAccount Number: ****5646\nStatement Period: 04/02/2026 - 05/02/2026\n\nOpening Balance: $27,818.22\n\nDate Description Amount Balance\nββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ\n2026-04-10 Payment to Richardson-Gray $-1,987.74 $45,038.29\n2026-04-11 ACH Credit - Murphy-Gutierrez $4,320.85 $49,359.14\n2026-04-20 Wire Transfer In $8,946.84 $36,765.06\n2026-04-26 Payroll $-91.24 $36,673.82\n2026-05-01 Interest Payment $4,592.36 $41,266.18\n2026-05-01 Deposit $6,563.15 $47,026.03\n2026-05-02 POS Purchase - Larson, Cobb and Thompson $-803.30 $40,462.88\nββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ\n\nClosing Balance: $40,462.88\n\nTotal Debits: $-2,882.28\nTotal Credits: $24,423.20\n---"}, {"role": "assistant", "content": "{\"common\":{\"document_type\":\"bank_statement\",\"date\":\"2026-05-02\",\"issuer\":{\"name\":\"Pacific Commerce Bank\",\"address\":null},\"recipient\":{\"name\":\"Abigail Andrade\",\"address\":null},\"total_amount\":40462.88,\"currency\":\"USD\"},\"line_items\":[{\"description\":\"Payment to Richardson-Gray\",\"quantity\":1,\"unit_price\":-1987.74,\"amount\":-1987.74},{\"description\":\"ACH Credit - Murphy-Gutierrez\",\"quantity\":1,\"unit_price\":4320.85,\"amount\":4320.85},{\"description\":\"Wire Transfer In\",\"quantity\":1,\"unit_price\":8946.84,\"amount\":8946.84},{\"description\":\"Payroll\",\"quantity\":1,\"unit_price\":-91.24,\"amount\":-91.24},{\"description\":\"Interest Payment\",\"quantity\":1,\"unit_price\":4592.36,\"amount\":4592.36},{\"description\":\"Deposit\",\"quantity\":1,\"unit_price\":6563.15,\"amount\":6563.15},{\"description\":\"POS Purchase - Larson, Cobb and Thompson\",\"quantity\":1,\"unit_price\":-803.3,\"amount\":-803.3}],\"type_specific\":{\"account_number\":null,\"statement_period\":null,\"opening_balance\":27818.22,\"closing_balance\":40462.88},\"flags\":[{\"category\":\"missing_field\",\"field\":\"type_specific.account_number\",\"severity\":\"medium\",\"description\":\"Required field 'Account Number' is missing from the document.\"},{\"category\":\"missing_field\",\"field\":\"type_specific.statement_period\",\"severity\":\"medium\",\"description\":\"Required field 'Statement Period' is missing from the document.\"}],\"confidence_score\":0.95}"}]} | |
| {"messages": [{"role": "system", "content": "You are a financial document extraction expert. Your task is to:\n\n1. Identify the document type (invoice, purchase_order, receipt, or bank_statement).\n2. Extract all relevant fields into a structured JSON object following this exact schema:\n - \"common\": document_type, date, issuer (name, address), recipient (name, address), total_amount, currency\n - \"line_items\": array of {description, quantity, unit_price, amount}\n - \"type_specific\": fields specific to the document type\n - \"flags\": array of detected anomalies, each with {category, field, severity, description}\n - \"confidence_score\": your confidence in the extraction (0.0 to 1.0)\n\n3. Analyze the document for anomalies across these categories:\n - arithmetic_error: Mathematical calculations that don't add up\n - missing_field: Required fields that are absent from the document\n - format_anomaly: Inconsistent formats, negative quantities, duplicate entries\n - business_logic: Unusual amounts, suspicious patterns, round-number fraud indicators\n - cross_field: Mismatched references between related fields or documents\n\n4. If no anomalies are found, return an empty \"flags\" array.\n\nOutput ONLY valid JSON. No explanations, no markdown, no code blocks β just the raw JSON object."}, {"role": "user", "content": "Extract structured data from this financial document:\n\n---\n============================================================\n Citizens Trust Bank\n ACCOUNT STATEMENT\n============================================================\n\nAccount Holder: Vincent Smith\nAccount Number: ****6770\nStatement Period: 04/02/2026 - 05/02/2026\n\nOpening Balance: $7,857.97\n\nDate Description Amount Balance\nββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ\n2026-04-02 ACH Debit - Mitchell, Johnson and Greene $-3,538.78 $2,227.43\n2026-04-02 Utility Payment - Electric $-311.65 $1,915.78\n2026-04-02 Wire Transfer - Webb $-544.01 $20,210.43\n2026-04-04 Payroll $-2,829.36 $22,237.98\n2026-04-05 Check #4948 $-3,102.71 $17,107.72\n2026-04-10 Wire Transfer In $7,959.62 $25,067.34\n2026-04-14 Payment to Harris and Sons $-3,187.37 $18,574.04\n2026-04-17 Payment to Morrow Group $-2,624.48 $5,233.49\n2026-04-18 Refund - Phillips, Arroyo and Pratt $9,775.92 $20,754.44\n2026-04-22 Refund - Dunn Ltd $9,131.88 $11,047.66\n2026-04-23 Insurance Premium $-69.14 $10,978.52\n2026-04-24 Payment to Valentine Ltd $-476.57 $21,761.41\n2026-04-28 Client Payment $532.72 $5,766.21\nββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ\n\nClosing Balance: $5,766.21\n\nTotal Debits: $-16,684.07\nTotal Credits: $27,400.14\n---"}, {"role": "assistant", "content": "{\"common\":{\"document_type\":\"bank_statement\",\"date\":\"2026-05-02\",\"issuer\":{\"name\":\"Citizens Trust Bank\",\"address\":null},\"recipient\":{\"name\":\"Vincent Smith\",\"address\":null},\"total_amount\":5766.21,\"currency\":\"USD\"},\"line_items\":[{\"description\":\"ACH Debit - Mitchell, Johnson and Greene\",\"quantity\":1,\"unit_price\":-3538.78,\"amount\":-3538.78},{\"description\":\"Utility Payment - Electric\",\"quantity\":1,\"unit_price\":-311.65,\"amount\":-311.65},{\"description\":\"Wire Transfer - Webb\",\"quantity\":1,\"unit_price\":-544.01,\"amount\":-544.01},{\"description\":\"Payroll\",\"quantity\":1,\"unit_price\":-2829.36,\"amount\":-2829.36},{\"description\":\"Check #4948\",\"quantity\":1,\"unit_price\":-3102.71,\"amount\":-3102.71},{\"description\":\"Wire Transfer In\",\"quantity\":1,\"unit_price\":7959.62,\"amount\":7959.62},{\"description\":\"Payment to Harris and Sons\",\"quantity\":1,\"unit_price\":-3187.37,\"amount\":-3187.37},{\"description\":\"Payment to Morrow Group\",\"quantity\":1,\"unit_price\":-2624.48,\"amount\":-2624.48},{\"description\":\"Refund - Phillips, Arroyo and Pratt\",\"quantity\":1,\"unit_price\":9775.92,\"amount\":9775.92},{\"description\":\"Refund - Dunn Ltd\",\"quantity\":1,\"unit_price\":9131.88,\"amount\":9131.88},{\"description\":\"Insurance Premium\",\"quantity\":1,\"unit_price\":-69.14,\"amount\":-69.14},{\"description\":\"Payment to Valentine Ltd\",\"quantity\":1,\"unit_price\":-476.57,\"amount\":-476.57},{\"description\":\"Client Payment\",\"quantity\":1,\"unit_price\":532.72,\"amount\":532.72}],\"type_specific\":{\"account_number\":\"****6770\",\"statement_period\":\"2026-04-02 to 2026-05-02\",\"opening_balance\":7857.97,\"closing_balance\":5766.21},\"flags\":[],\"confidence_score\":0.95}"}]} | |
| {"messages": [{"role": "system", "content": "You are a financial document extraction expert. Your task is to:\n\n1. Identify the document type (invoice, purchase_order, receipt, or bank_statement).\n2. Extract all relevant fields into a structured JSON object following this exact schema:\n - \"common\": document_type, date, issuer (name, address), recipient (name, address), total_amount, currency\n - \"line_items\": array of {description, quantity, unit_price, amount}\n - \"type_specific\": fields specific to the document type\n - \"flags\": array of detected anomalies, each with {category, field, severity, description}\n - \"confidence_score\": your confidence in the extraction (0.0 to 1.0)\n\n3. Analyze the document for anomalies across these categories:\n - arithmetic_error: Mathematical calculations that don't add up\n - missing_field: Required fields that are absent from the document\n - format_anomaly: Inconsistent formats, negative quantities, duplicate entries\n - business_logic: Unusual amounts, suspicious patterns, round-number fraud indicators\n - cross_field: Mismatched references between related fields or documents\n\n4. If no anomalies are found, return an empty \"flags\" array.\n\nOutput ONLY valid JSON. No explanations, no markdown, no code blocks β just the raw JSON object."}, {"role": "user", "content": "Extract structured data from this financial document:\n\n---\n==================================================\nINVOICE\n==================================================\n\nInvoice Number: INV-2023-6507\nDate: October 16, 2025\n\nFrom:\n Whitney, Coffey and Ballard\n 48175 Richards View Suite 137, New Nicoleside, MD 80465\n\nBill To:\n Holland Ltd\n 612 Sophia Hollow Suite 113, South Shawn, TN 97905\n\nββββββββββββββββββββββββββββββββββββββββββββββββββ\nItem Qty Price Amount\nββββββββββββββββββββββββββββββββββββββββββββββββββ\nBall Bearing 6205-2RS 465 GBP 6.23 GBP 2896.95\nCopper Wire 14AWG (100ft) 289 GBP 64.72 GBP 18704.08\nHydraulic Cylinder 3in Bore 43 GBP 236.05 GBP 10150.15\nPVC Pipe 2in x 10ft 70 GBP 13.00 GBP 910.00\nThermal Paste 30g 94 GBP 2.40 GBP 225.60\nAir Filter Cartridge 195 GBP 9.16 GBP 1786.20\nSteel Bolts M10x50 233 GBP 3.08 GBP 717.64\nββββββββββββββββββββββββββββββββββββββββββββββββββ\nSubtotal: GBP 35390.62\nTax (8.5%): GBP 3008.20\nTOTAL: GBP 38398.82\n\nPayment Terms: 2/10 Net 30\nDue Date: 2025-11-30\n\nThank you for your business!\n---"}, {"role": "assistant", "content": "{\"common\":{\"document_type\":\"invoice\",\"date\":\"2025-10-16\",\"issuer\":{\"name\":\"Whitney, Coffey and Ballard\",\"address\":\"48175 Richards View Suite 137, New Nicoleside, MD 80465\"},\"recipient\":{\"name\":\"Holland Ltd\",\"address\":\"612 Sophia Hollow Suite 113, South Shawn, TN 97905\"},\"total_amount\":38398.82,\"currency\":\"GBP\"},\"line_items\":[{\"description\":\"Ball Bearing 6205-2RS\",\"quantity\":465,\"unit_price\":6.23,\"amount\":2896.95},{\"description\":\"Copper Wire 14AWG (100ft)\",\"quantity\":289,\"unit_price\":64.72,\"amount\":18704.08},{\"description\":\"Hydraulic Cylinder 3in Bore\",\"quantity\":43,\"unit_price\":236.05,\"amount\":10150.15},{\"description\":\"PVC Pipe 2in x 10ft\",\"quantity\":70,\"unit_price\":13.0,\"amount\":910.0},{\"description\":\"Thermal Paste 30g\",\"quantity\":94,\"unit_price\":2.4,\"amount\":225.6},{\"description\":\"Air Filter Cartridge\",\"quantity\":195,\"unit_price\":9.16,\"amount\":1786.2},{\"description\":\"Steel Bolts M10x50\",\"quantity\":233,\"unit_price\":3.08,\"amount\":717.64}],\"type_specific\":{\"invoice_number\":\"INV-2023-6507\",\"due_date\":\"2025-11-30\",\"payment_terms\":\"2/10 Net 30\",\"tax_amount\":3008.2,\"subtotal\":35390.62},\"flags\":[],\"confidence_score\":0.95}"}]} | |
| {"messages": [{"role": "system", "content": "You are a financial document extraction expert. Your task is to:\n\n1. Identify the document type (invoice, purchase_order, receipt, or bank_statement).\n2. Extract all relevant fields into a structured JSON object following this exact schema:\n - \"common\": document_type, date, issuer (name, address), recipient (name, address), total_amount, currency\n - \"line_items\": array of {description, quantity, unit_price, amount}\n - \"type_specific\": fields specific to the document type\n - \"flags\": array of detected anomalies, each with {category, field, severity, description}\n - \"confidence_score\": your confidence in the extraction (0.0 to 1.0)\n\n3. Analyze the document for anomalies across these categories:\n - arithmetic_error: Mathematical calculations that don't add up\n - missing_field: Required fields that are absent from the document\n - format_anomaly: Inconsistent formats, negative quantities, duplicate entries\n - business_logic: Unusual amounts, suspicious patterns, round-number fraud indicators\n - cross_field: Mismatched references between related fields or documents\n\n4. If no anomalies are found, return an empty \"flags\" array.\n\nOutput ONLY valid JSON. No explanations, no markdown, no code blocks β just the raw JSON object."}, {"role": "user", "content": "Extract structured data from this financial document:\n\n---\n==================================================\nINVOICE\n==================================================\n\nInvoice Number: INV-2024-6439\nDate: 2025-04-22\n\nFrom:\n Jimenez, Glass and Stone\n 69284 Garcia Islands Suite 479, Jacksonmouth, MI 59107\n\nBill To:\n Acosta LLC\n 0165 Stephanie Divide Suite 021, Port Deanna, MN 50682\n\nββββββββββββββββββββββββββββββββββββββββββββββββββ\nItem Qty Price Amount\nββββββββββββββββββββββββββββββββββββββββββββββββββ\nRubber O-Ring Seals 241 USD 1.90 USD 457.90\nAir Filter Cartridge 136 USD 9.32 USD 1267.52\nHydraulic Cylinder 3in Bore 355 USD 178.41 USD 63335.55\nSteel Bolts M10x50 42 USD 7.31 USD 307.02\nCopper Wire 14AWG (100ft) 437 USD 65.57 USD 28654.09\nStainless Steel Sheet 4x8ft 274 USD 99.72 USD 27323.28\nCable Ties 12in (100pk) 163 USD 5.88 USD 958.44\nNitrile Gloves Box (100) 455 USD 14.11 USD 6420.05\nββββββββββββββββββββββββββββββββββββββββββββββββββ\nSubtotal: USD 128723.85\nTax (10.0%): USD 12872.39\nTOTAL: USD 141596.24\n\nPayment Terms: Net 30\nDue Date: 2025-05-22\n\nThank you for your business!\n\nReferenced Purchase Order: PO-2020-93853\n---"}, {"role": "assistant", "content": "{\"common\":{\"document_type\":\"invoice\",\"date\":\"2025-04-22\",\"issuer\":{\"name\":\"Jimenez, Glass and Stone\",\"address\":\"69284 Garcia Islands Suite 479, Jacksonmouth, MI 59107\"},\"recipient\":{\"name\":\"Acosta LLC\",\"address\":\"0165 Stephanie Divide Suite 021, Port Deanna, MN 50682\"},\"total_amount\":141596.24,\"currency\":\"USD\"},\"line_items\":[{\"description\":\"Rubber O-Ring Seals\",\"quantity\":241,\"unit_price\":1.9,\"amount\":457.9},{\"description\":\"Air Filter Cartridge\",\"quantity\":136,\"unit_price\":9.32,\"amount\":1267.52},{\"description\":\"Hydraulic Cylinder 3in Bore\",\"quantity\":355,\"unit_price\":178.41,\"amount\":63335.55},{\"description\":\"Steel Bolts M10x50\",\"quantity\":42,\"unit_price\":7.31,\"amount\":307.02},{\"description\":\"Copper Wire 14AWG (100ft)\",\"quantity\":437,\"unit_price\":65.57,\"amount\":28654.09},{\"description\":\"Stainless Steel Sheet 4x8ft\",\"quantity\":274,\"unit_price\":99.72,\"amount\":27323.28},{\"description\":\"Cable Ties 12in (100pk)\",\"quantity\":163,\"unit_price\":5.88,\"amount\":958.44},{\"description\":\"Nitrile Gloves Box (100)\",\"quantity\":455,\"unit_price\":14.11,\"amount\":6420.05}],\"type_specific\":{\"invoice_number\":\"INV-2024-6439\",\"due_date\":\"2025-05-22\",\"payment_terms\":\"Net 30\",\"tax_amount\":12872.39,\"subtotal\":128723.85},\"flags\":[{\"category\":\"cross_field\",\"field\":\"referenced_po\",\"severity\":\"high\",\"description\":\"Invoice references Purchase Order 'PO-2020-93853' which could not be verified in the system.\"}],\"confidence_score\":0.95}"}]} | |