Spaces:
Running
Running
Update src/streamlit_app.py
Browse files- src/streamlit_app.py +27 -43
src/streamlit_app.py
CHANGED
|
@@ -381,8 +381,7 @@ def run_inference_vllm(image: Image.Image):
|
|
| 381 |
|
| 382 |
# Extraction prompt (JSON format)
|
| 383 |
EXTRACTION_PROMPT = """Please carefully examine this invoice image and extract all the information into the following structured JSON format. Pay close attention to details and ensure accuracy in number formatting and text extraction.
|
| 384 |
-
|
| 385 |
-
Extract the data into this exact JSON structure (do not add or remove keys):
|
| 386 |
|
| 387 |
{
|
| 388 |
"header": {
|
|
@@ -408,7 +407,8 @@ Extract the data into this exact JSON structure (do not add or remove keys):
|
|
| 408 |
"quantity": "Quantity of items",
|
| 409 |
"unit_price": "Price per unit",
|
| 410 |
"amount": "Total amount for this line item",
|
| 411 |
-
"
|
|
|
|
| 412 |
"Line_total": "Total amount including tax for this line"
|
| 413 |
}
|
| 414 |
],
|
|
@@ -421,46 +421,30 @@ Extract the data into this exact JSON structure (do not add or remove keys):
|
|
| 421 |
}
|
| 422 |
}
|
| 423 |
|
| 424 |
-
|
| 425 |
-
|
| 426 |
-
|
| 427 |
-
|
| 428 |
-
|
| 429 |
-
|
| 430 |
-
|
| 431 |
-
|
| 432 |
-
|
| 433 |
-
|
| 434 |
-
|
| 435 |
-
|
| 436 |
-
|
| 437 |
-
|
| 438 |
-
|
| 439 |
-
|
| 440 |
-
|
| 441 |
-
|
| 442 |
-
|
| 443 |
-
|
| 444 |
-
|
| 445 |
-
|
| 446 |
-
|
| 447 |
-
|
| 448 |
-
- Include every visible line item. Preserve multi-line descriptions using literal "\\n" where line breaks exist.
|
| 449 |
-
- If SKU is not shown, set "SKU": "".
|
| 450 |
-
- Ensure "quantity", "unit_price", "amount", "tax", and "Line_total" are consistent with the rules above.
|
| 451 |
-
7) Summary invariants (when values are available on the invoice)
|
| 452 |
-
- "summary.subtotal" = sum of items[].amount.
|
| 453 |
-
- "summary.tax_amount" = sum of items[].tax (if you allocated or computed it). If the invoice prints a total tax amount, use that exact value and make per-line taxes sum to it.
|
| 454 |
-
- "summary.total_amount" = subtotal + tax_amount.
|
| 455 |
-
- If any of these values are not printed and cannot be derived reliably from the printed numbers, leave them as "".
|
| 456 |
-
8) Text extraction fidelity
|
| 457 |
-
- Extract text exactly as printed (names, addresses, bank fields, references). Keep special characters and spacing (normalize only obvious OCR artifacts).
|
| 458 |
-
- If a bank field is absent (IBAN/SWIFT/routing/etc.), set it to "".
|
| 459 |
-
|
| 460 |
-
Output constraints:
|
| 461 |
-
- Return ONLY the JSON object described above (no explanations, no code fences, no trailing commas).
|
| 462 |
-
- Keep all values as strings.
|
| 463 |
-
- Do not add extra keys or sections beyond the given schema."""
|
| 464 |
|
| 465 |
try:
|
| 466 |
# Resize image if too large (max dimension 2048px to avoid payload size issues)
|
|
|
|
| 381 |
|
| 382 |
# Extraction prompt (JSON format)
|
| 383 |
EXTRACTION_PROMPT = """Please carefully examine this invoice image and extract all the information into the following structured JSON format. Pay close attention to details and ensure accuracy in number formatting and text extraction.
|
| 384 |
+
Extract the data into this exact JSON structure:
|
|
|
|
| 385 |
|
| 386 |
{
|
| 387 |
"header": {
|
|
|
|
| 407 |
"quantity": "Quantity of items",
|
| 408 |
"unit_price": "Price per unit",
|
| 409 |
"amount": "Total amount for this line item",
|
| 410 |
+
"t_rate": "tax_rate",
|
| 411 |
+
"tax": "amount*t_rate/100",
|
| 412 |
"Line_total": "Total amount including tax for this line"
|
| 413 |
}
|
| 414 |
],
|
|
|
|
| 421 |
}
|
| 422 |
}
|
| 423 |
|
| 424 |
+
|
| 425 |
+
IMPORTANT GUIDELINES:
|
| 426 |
+
- Extract only the bank account details matching the invoice currency.
|
| 427 |
+
Example:
|
| 428 |
+
Invoice currency = USD → extract the USD bank account.
|
| 429 |
+
Invoice currency = GBP → extract the GBP bank account.
|
| 430 |
+
- Preserve original number formatting (including commas, decimals), Do not include currency symbol for amount field.
|
| 431 |
+
- If multiple line items exist, include all of them in the items array
|
| 432 |
+
- Use empty string "" for any field that is not present or cannot be clearly identified
|
| 433 |
+
- Maintain accuracy in financial figures - double-check all numbers
|
| 434 |
+
- Do not round the tax percentage.For example, If the invoice shows "8.875" or "2.75" (or your calculation yields "8.875", or "2.75"), use 8.875, 2.75 exactly — do not round it to "8.87", "8.88" or "2.8". Store tax_rate as the numeric string without the percent sign (e.g., "8.875").
|
| 435 |
+
- Extract text exactly as it appears, including special characters and formatting
|
| 436 |
+
- For dates, preserve the original format shown in the invoice
|
| 437 |
+
- If both sender and receiver addresses are in the United States, extract ACH; otherwise extract Wire transfer (WT).
|
| 438 |
+
- If payment terms specify a number of days (e.g., “payment terms 30 days”, “payable within 15 days”, “terms 45 days”, “Net 30”, or any similar phrase), compute: due_date = invoice_date + N days. If the invoice states “due on receipt”, “due upon receipt” ,"Immediate" or any similar phrase meaning immediate payment, then: due_date = invoice_date. Use the same date format as the invoice. Output only the computed due_date.
|
| 439 |
+
- if tax_rate is not given in invoice but tax_amount is given, calculate the tax_rate using tax_amount and subtotal.
|
| 440 |
+
- line-item wise tax calculation has to be done properly based ONLY on the tax_rate given in the summary, and the same tax_rate must be used for every line item in that invoice.
|
| 441 |
+
- If currency symbols are present, note them appropriately
|
| 442 |
+
-for amount fields, give only NUMERIC VALUE, do not include symbol($) or letter("EUR", "USD") to the amount fields.
|
| 443 |
+
- If a discount is present, first subtract the discount amount from the item's (or invoice's) actual amount, then calculate tax on the discounted amount. Tax must be computed on the net (post-discount) value.
|
| 444 |
+
- If discount is shown only in the summary (after subtotal), subtract it from subtotal to get the taxable base and then calculate tax; if discounts are line-item, subtract each from its line to get line_total — do NOT apportion summary discounts to line items.
|
| 445 |
+
- If any line item includes a discount, subtract the discount amount from that line item's total price. The resulting value should be recorded as the "Line Total" for that item.
|
| 446 |
+
- If a tax rate is given (for example, "20%") but the invoice explicitly shows the tax amount as zero (for example, "0.00"), do not calculate or infer any tax; keep the tax amount as shown (0.00).
|
| 447 |
+
Return only the JSON object with the extracted information"""
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 448 |
|
| 449 |
try:
|
| 450 |
# Resize image if too large (max dimension 2048px to avoid payload size issues)
|