Ankushbl6 commited on
Commit
3cd4a9e
·
verified ·
1 Parent(s): 40bde2e

Update src/streamlit_app.py

Browse files
Files changed (1) hide show
  1. src/streamlit_app.py +27 -43
src/streamlit_app.py CHANGED
@@ -381,8 +381,7 @@ def run_inference_vllm(image: Image.Image):
381
 
382
  # Extraction prompt (JSON format)
383
  EXTRACTION_PROMPT = """Please carefully examine this invoice image and extract all the information into the following structured JSON format. Pay close attention to details and ensure accuracy in number formatting and text extraction.
384
-
385
- Extract the data into this exact JSON structure (do not add or remove keys):
386
 
387
  {
388
  "header": {
@@ -408,7 +407,8 @@ Extract the data into this exact JSON structure (do not add or remove keys):
408
  "quantity": "Quantity of items",
409
  "unit_price": "Price per unit",
410
  "amount": "Total amount for this line item",
411
- "tax": "Tax amount for this item",
 
412
  "Line_total": "Total amount including tax for this line"
413
  }
414
  ],
@@ -421,46 +421,30 @@ Extract the data into this exact JSON structure (do not add or remove keys):
421
  }
422
  }
423
 
424
- STRICT POLICY RULES (apply exactly, do not deviate):
425
- 1) Number formatting & types
426
- - Preserve the original number formatting from the invoice (commas, decimal places, currency symbols in text fields if shown).
427
- - In this JSON, output all values as strings. If a field is not present or cannot be determined with high confidence, output "" (empty string). Do not use null, 0, or placeholders.
428
- 2) Currency selection (multi-currency invoices)
429
- - If multiple currencies are shown, ALWAYS choose the recipient/customer currency for all monetary fields in items and summary.
430
- - Do NOT perform FX conversion. Select the column/figures that are explicitly in the recipient's currency.
431
- - For "summary.currency", prefer the printed 3-letter code (e.g., USD, EUR, INR). If only an unambiguous symbol is present, map it (₹→INR, €→EUR, $→USD when clearly USD). If ambiguous, leave "".
432
- 3) Tax handling (no rounding of rates; don't recompute given totals)
433
- - Do NOT round tax percentages. Use the original precision for any calculations; keep the printed formatting for "summary.tax_rate".
434
- - If a TOTAL tax amount is explicitly printed on the invoice (e.g., "Tax", "VAT", "IGST", "Total Tax"), TREAT IT AS AUTHORITATIVE. Do NOT recompute a new total.
435
- a) If per-line tax amounts are printed, copy them directly.
436
- b) If per-line tax amounts are not printed, allocate the printed TOTAL tax proportionally across line items by each line's net amount (quantity * unit_price − discount). Use precise arithmetic; ensure the sum of allocated per-line taxes equals the printed TOTAL tax (adjust the last cent minimally if required).
437
- - If NO total tax amount is printed but a tax rate is printed, compute per-line tax as: tax = (quantity * unit_price − discount) × (exact, unrounded tax rate). Then set "summary.tax_amount" = sum of per-line taxes.
438
- - "items[].amount" is the pre-tax line amount AFTER discount. "items[].Line_total" = amount + tax.
439
- 4) Discounts
440
- - If discounts are present (per-line or overall), compute tax on the discounted base: (quantity * unit_price discount). Never compute tax on the undiscounted amount.
441
- 5) Due date calculation from payment terms
442
- - Preserve the invoice's original date format for both "invoice_date" and "due_date".
443
- - If explicit due date is printed, use it as "due_date".
444
- - If payment terms specify Net X (e.g., Net 30), set due_date = invoice_date + X days (same format as invoice_date).
445
- - If terms say "upon receipt", "upon publication", or equivalent, due_date = invoice_date.
446
- - If both a printed due date and terms exist and they conflict, prefer the printed due date.
447
- 6) Items array
448
- - Include every visible line item. Preserve multi-line descriptions using literal "\\n" where line breaks exist.
449
- - If SKU is not shown, set "SKU": "".
450
- - Ensure "quantity", "unit_price", "amount", "tax", and "Line_total" are consistent with the rules above.
451
- 7) Summary invariants (when values are available on the invoice)
452
- - "summary.subtotal" = sum of items[].amount.
453
- - "summary.tax_amount" = sum of items[].tax (if you allocated or computed it). If the invoice prints a total tax amount, use that exact value and make per-line taxes sum to it.
454
- - "summary.total_amount" = subtotal + tax_amount.
455
- - If any of these values are not printed and cannot be derived reliably from the printed numbers, leave them as "".
456
- 8) Text extraction fidelity
457
- - Extract text exactly as printed (names, addresses, bank fields, references). Keep special characters and spacing (normalize only obvious OCR artifacts).
458
- - If a bank field is absent (IBAN/SWIFT/routing/etc.), set it to "".
459
-
460
- Output constraints:
461
- - Return ONLY the JSON object described above (no explanations, no code fences, no trailing commas).
462
- - Keep all values as strings.
463
- - Do not add extra keys or sections beyond the given schema."""
464
 
465
  try:
466
  # Resize image if too large (max dimension 2048px to avoid payload size issues)
 
381
 
382
  # Extraction prompt (JSON format)
383
  EXTRACTION_PROMPT = """Please carefully examine this invoice image and extract all the information into the following structured JSON format. Pay close attention to details and ensure accuracy in number formatting and text extraction.
384
+ Extract the data into this exact JSON structure:
 
385
 
386
  {
387
  "header": {
 
407
  "quantity": "Quantity of items",
408
  "unit_price": "Price per unit",
409
  "amount": "Total amount for this line item",
410
+ "t_rate": "tax_rate",
411
+ "tax": "amount*t_rate/100",
412
  "Line_total": "Total amount including tax for this line"
413
  }
414
  ],
 
421
  }
422
  }
423
 
424
+
425
+ IMPORTANT GUIDELINES:
426
+ - Extract only the bank account details matching the invoice currency.
427
+ Example:
428
+ Invoice currency = USD → extract the USD bank account.
429
+ Invoice currency = GBP extract the GBP bank account.
430
+ - Preserve original number formatting (including commas, decimals), Do not include currency symbol for amount field.
431
+ - If multiple line items exist, include all of them in the items array
432
+ - Use empty string "" for any field that is not present or cannot be clearly identified
433
+ - Maintain accuracy in financial figures - double-check all numbers
434
+ - Do not round the tax percentage.For example, If the invoice shows "8.875" or "2.75" (or your calculation yields "8.875", or "2.75"), use 8.875, 2.75 exactly — do not round it to "8.87", "8.88" or "2.8". Store tax_rate as the numeric string without the percent sign (e.g., "8.875").
435
+ - Extract text exactly as it appears, including special characters and formatting
436
+ - For dates, preserve the original format shown in the invoice
437
+ - If both sender and receiver addresses are in the United States, extract ACH; otherwise extract Wire transfer (WT).
438
+ - If payment terms specify a number of days (e.g., “payment terms 30 days”, “payable within 15 days”, “terms 45 days”, “Net 30”, or any similar phrase), compute: due_date = invoice_date + N days. If the invoice states “due on receipt”, “due upon receipt” ,"Immediate" or any similar phrase meaning immediate payment, then: due_date = invoice_date. Use the same date format as the invoice. Output only the computed due_date.
439
+ - if tax_rate is not given in invoice but tax_amount is given, calculate the tax_rate using tax_amount and subtotal.
440
+ - line-item wise tax calculation has to be done properly based ONLY on the tax_rate given in the summary, and the same tax_rate must be used for every line item in that invoice.
441
+ - If currency symbols are present, note them appropriately
442
+ -for amount fields, give only NUMERIC VALUE, do not include symbol($) or letter("EUR", "USD") to the amount fields.
443
+ - If a discount is present, first subtract the discount amount from the item's (or invoice's) actual amount, then calculate tax on the discounted amount. Tax must be computed on the net (post-discount) value.
444
+ - If discount is shown only in the summary (after subtotal), subtract it from subtotal to get the taxable base and then calculate tax; if discounts are line-item, subtract each from its line to get line_total — do NOT apportion summary discounts to line items.
445
+ - If any line item includes a discount, subtract the discount amount from that line item's total price. The resulting value should be recorded as the "Line Total" for that item.
446
+ - If a tax rate is given (for example, "20%") but the invoice explicitly shows the tax amount as zero (for example, "0.00"), do not calculate or infer any tax; keep the tax amount as shown (0.00).
447
+ Return only the JSON object with the extracted information"""
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
448
 
449
  try:
450
  # Resize image if too large (max dimension 2048px to avoid payload size issues)