Spaces:
Sleeping
Sleeping
Update utils.py
Browse files
utils.py
CHANGED
|
@@ -42,32 +42,32 @@ def process_local_pdf(pdf_bytes: bytes):
|
|
| 42 |
prompt: The prompt template to use (should contain {page_num} if needed)
|
| 43 |
api_key: Your Google AI Studio API key
|
| 44 |
"""
|
| 45 |
-
|
| 46 |
-
prompt = """Please analyze the provided images of the real estate document set and perform the following actions:
|
| 47 |
-
|
| 48 |
-
1. **Identify Parties**: Determine and list all present parties involved in the transaction. Always identify and include **Seller 1** and **Buyer 1** if they are present in the documents. Additionally, include **Seller 2** and **Buyer 2** only if they are explicitly mentioned.
|
| 49 |
-
|
| 50 |
-
2. **Identify Missing Items**: For each identified party, including at minimum **Seller 1** and **Buyer 1**, check all pages for any missing signatures or initials. Only check for **Seller 2** or **Buyer 2** if they were identified in step 1.
|
| 51 |
-
|
| 52 |
-
3. **Identify Checked Boxes**: Locate and list all checkboxes that have been marked or checked.
|
| 53 |
-
|
| 54 |
-
4. **Generate Secondary Questions**: For checkboxes that indicate significant waivers (e.g., home warranty, inspection rights, lead paint assessment), specific conditions (e.g., cash sale, contingency status), potential conflicts, or reference other documents, formulate a relevant 'Secondary Question' designed to prompt confirmation or clarification from the user/parties involved.
|
| 55 |
-
|
| 56 |
-
5. **Check for Required Paperwork**: Based only on the checkboxes identified in step 3 that explicitly state or strongly imply a specific addendum or disclosure document should be attached (e.g., "Lead Based Paint Disclosure Addendum attached", "See Counter Offer Addendum", "Seller's Disclosure...Addendum attached", "Retainer Addendum attached", etc.), check if a document matching that description appears to be present within the provided image set. Note whether this implied paperwork is 'Found', 'Missing', or 'Potentially Missing/Ambiguous'.
|
| 57 |
-
|
| 58 |
-
6. **Identify Conflicts**: Specifically look for and note any directly contradictory information or conflicting checked boxes (like the conflicting inspection clauses found previously).
|
| 59 |
-
|
| 60 |
-
7. **Provide Location**: For every identified item (missing signature/initial, checked box, required paperwork status, party identification, conflict), specify the approximate line number(s) or clear location on the page (e.g., Bottom Right Initials, Seller Signature Block).
|
| 61 |
-
|
| 62 |
-
8. **Format Output**: Present all findings in CSV format with the following columns:
|
| 63 |
-
|
| 64 |
-
|
| 65 |
-
|
| 66 |
-
|
| 67 |
-
|
| 68 |
-
|
| 69 |
-
|
| 70 |
-
"""
|
| 71 |
|
| 72 |
# Convert to images
|
| 73 |
images = pdf_to_images(pdf_bytes)
|
|
|
|
| 42 |
prompt: The prompt template to use (should contain {page_num} if needed)
|
| 43 |
api_key: Your Google AI Studio API key
|
| 44 |
"""
|
| 45 |
+
# Configure Gemini
|
| 46 |
+
prompt = """Please analyze the provided images of the real estate document set and perform the following actions:
|
| 47 |
+
|
| 48 |
+
1. **Identify Parties**: Determine and list all present parties involved in the transaction. Always identify and include **Seller 1** and **Buyer 1** if they are present in the documents. Additionally, include **Seller 2** and **Buyer 2** only if they are explicitly mentioned.
|
| 49 |
+
|
| 50 |
+
2. **Identify Missing Items**: For each identified party, including at minimum **Seller 1** and **Buyer 1**, check all pages for any missing signatures or initials. Only check for **Seller 2** or **Buyer 2** if they were identified in step 1.
|
| 51 |
+
|
| 52 |
+
3. **Identify Checked Boxes**: Locate and list all checkboxes that have been marked or checked.
|
| 53 |
+
|
| 54 |
+
4. **Generate Secondary Questions**: For checkboxes that indicate significant waivers (e.g., home warranty, inspection rights, lead paint assessment), specific conditions (e.g., cash sale, contingency status), potential conflicts, or reference other documents, formulate a relevant 'Secondary Question' designed to prompt confirmation or clarification from the user/parties involved.
|
| 55 |
+
|
| 56 |
+
5. **Check for Required Paperwork**: Based only on the checkboxes identified in step 3 that explicitly state or strongly imply a specific addendum or disclosure document should be attached (e.g., "Lead Based Paint Disclosure Addendum attached", "See Counter Offer Addendum", "Seller's Disclosure...Addendum attached", "Retainer Addendum attached", etc.), check if a document matching that description appears to be present within the provided image set. Note whether this implied paperwork is 'Found', 'Missing', or 'Potentially Missing/Ambiguous'.
|
| 57 |
+
|
| 58 |
+
6. **Identify Conflicts**: Specifically look for and note any directly contradictory information or conflicting checked boxes (like the conflicting inspection clauses found previously).
|
| 59 |
+
|
| 60 |
+
7. **Provide Location**: For every identified item (missing signature/initial, checked box, required paperwork status, party identification, conflict), specify the approximate line number(s) or clear location on the page (e.g., Bottom Right Initials, Seller Signature Block).
|
| 61 |
+
|
| 62 |
+
8. **Format Output**: Present all findings in CSV format with the following columns:
|
| 63 |
+
- **Category**: (e.g., Parties, Missing Item, Checked Box, Required Paperwork, Conflict)
|
| 64 |
+
- **Location**: (e.g., Sale Contract (Image 8 Pg 1))
|
| 65 |
+
- **Line Item(s)**: (e.g., 4)
|
| 66 |
+
- **Item Type**: (e.g., Seller 1, Buyer 1, Seller Signature, Seller Initials)
|
| 67 |
+
- **Status**: (e.g., Identified, Missing, Checked, Found, Potentially Missing, Conflict)
|
| 68 |
+
- **Details**: (e.g., "Seller signature line (top line) is empty.", "Two initial boxes for Seller (approx line 106-107 area) are empty.")
|
| 69 |
+
- **Secondary Question** (if applicable): (e.g., "Is the Buyer aware they are waiving the home warranty?", "Has the Buyer received and reviewed the Seller's Disclosure?")
|
| 70 |
+
"""
|
| 71 |
|
| 72 |
# Convert to images
|
| 73 |
images = pdf_to_images(pdf_bytes)
|