Spaces:
Running
Running
Add 3 files
Browse files- README.md +7 -5
- index.html +654 -19
- prompts.txt +3 -0
README.md
CHANGED
|
@@ -1,10 +1,12 @@
|
|
| 1 |
---
|
| 2 |
-
title:
|
| 3 |
-
emoji:
|
| 4 |
-
colorFrom:
|
| 5 |
-
colorTo:
|
| 6 |
sdk: static
|
| 7 |
pinned: false
|
|
|
|
|
|
|
| 8 |
---
|
| 9 |
|
| 10 |
-
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
|
|
|
| 1 |
---
|
| 2 |
+
title: doccompare
|
| 3 |
+
emoji: 🐳
|
| 4 |
+
colorFrom: gray
|
| 5 |
+
colorTo: blue
|
| 6 |
sdk: static
|
| 7 |
pinned: false
|
| 8 |
+
tags:
|
| 9 |
+
- deepsite
|
| 10 |
---
|
| 11 |
|
| 12 |
+
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
index.html
CHANGED
|
@@ -1,19 +1,654 @@
|
|
| 1 |
-
<!
|
| 2 |
-
<html>
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
|
| 19 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<!DOCTYPE html>
|
| 2 |
+
<html lang="en">
|
| 3 |
+
<head>
|
| 4 |
+
<meta charset="UTF-8">
|
| 5 |
+
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
| 6 |
+
<title>DocuCompare - Intelligent PDF Comparison with Qwen 3</title>
|
| 7 |
+
<script src="https://cdn.tailwindcss.com"></script>
|
| 8 |
+
<link href="https://fonts.googleapis.com/css2?family=Inter:wght@300;400;500;600;700&display=swap" rel="stylesheet">
|
| 9 |
+
<script src="https://unpkg.com/feather-icons"></script>
|
| 10 |
+
<style>
|
| 11 |
+
:root {
|
| 12 |
+
--primary: #6e56cf;
|
| 13 |
+
--primary-light: #8b7adb;
|
| 14 |
+
--primary-dark: #4a3698;
|
| 15 |
+
--secondary: #2fb344;
|
| 16 |
+
--dark: #1e293b;
|
| 17 |
+
--light: #f8fafc;
|
| 18 |
+
}
|
| 19 |
+
|
| 20 |
+
body {
|
| 21 |
+
font-family: 'Inter', sans-serif;
|
| 22 |
+
background-color: #f1f5f9;
|
| 23 |
+
}
|
| 24 |
+
|
| 25 |
+
.gradient-bg {
|
| 26 |
+
background: linear-gradient(135deg, var(--primary) 0%, var(--primary-light) 100%);
|
| 27 |
+
}
|
| 28 |
+
|
| 29 |
+
.dropzone {
|
| 30 |
+
border: 2px dashed #cbd5e1;
|
| 31 |
+
transition: all 0.3s ease;
|
| 32 |
+
}
|
| 33 |
+
|
| 34 |
+
.dropzone.active {
|
| 35 |
+
border-color: var(--primary);
|
| 36 |
+
background-color: rgba(110, 86, 207, 0.05);
|
| 37 |
+
}
|
| 38 |
+
|
| 39 |
+
.progress-bar {
|
| 40 |
+
height: 6px;
|
| 41 |
+
border-radius: 3px;
|
| 42 |
+
background-color: #e2e8f0;
|
| 43 |
+
overflow: hidden;
|
| 44 |
+
}
|
| 45 |
+
|
| 46 |
+
.progress-fill {
|
| 47 |
+
height: 100%;
|
| 48 |
+
background-color: var(--primary);
|
| 49 |
+
transition: width 0.3s ease;
|
| 50 |
+
}
|
| 51 |
+
|
| 52 |
+
.result-card {
|
| 53 |
+
box-shadow: 0 4px 6px -1px rgba(0, 0, 0, 0.1), 0 2px 4px -1px rgba(0, 0, 0, 0.06);
|
| 54 |
+
transition: transform 0.3s ease, box-shadow 0.3s ease;
|
| 55 |
+
}
|
| 56 |
+
|
| 57 |
+
.result-card:hover {
|
| 58 |
+
transform: translateY(-2px);
|
| 59 |
+
box-shadow: 0 10px 15px -3px rgba(0, 0, 0, 0.1), 0 4px 6px -2px rgba(0, 0, 0, 0.05);
|
| 60 |
+
}
|
| 61 |
+
|
| 62 |
+
.file-chip {
|
| 63 |
+
background-color: #f0ebff;
|
| 64 |
+
color: var(--primary-dark);
|
| 65 |
+
}
|
| 66 |
+
|
| 67 |
+
.rotate-icon {
|
| 68 |
+
transform: rotate(0deg);
|
| 69 |
+
transition: transform 0.3s ease;
|
| 70 |
+
}
|
| 71 |
+
|
| 72 |
+
.rotate-icon.open {
|
| 73 |
+
transform: rotate(180deg);
|
| 74 |
+
}
|
| 75 |
+
|
| 76 |
+
pre {
|
| 77 |
+
white-space: pre-wrap;
|
| 78 |
+
word-wrap: break-word;
|
| 79 |
+
background-color: #f8fafc;
|
| 80 |
+
border: 1px solid #e2e8f0;
|
| 81 |
+
border-radius: 0.375rem;
|
| 82 |
+
padding: 1rem;
|
| 83 |
+
font-family: 'Courier New', Courier, monospace;
|
| 84 |
+
}
|
| 85 |
+
|
| 86 |
+
.loader {
|
| 87 |
+
width: 24px;
|
| 88 |
+
height: 24px;
|
| 89 |
+
border: 3px solid rgba(110, 86, 207, 0.3);
|
| 90 |
+
border-radius: 50%;
|
| 91 |
+
border-top-color: var(--primary);
|
| 92 |
+
animation: spin 1s ease-in-out infinite;
|
| 93 |
+
}
|
| 94 |
+
|
| 95 |
+
@keyframes spin {
|
| 96 |
+
to { transform: rotate(360deg); }
|
| 97 |
+
}
|
| 98 |
+
</style>
|
| 99 |
+
</head>
|
| 100 |
+
<body>
|
| 101 |
+
<div class="min-h-screen flex flex-col">
|
| 102 |
+
<!-- Header -->
|
| 103 |
+
<header class="gradient-bg text-white shadow-lg">
|
| 104 |
+
<div class="container mx-auto px-4 py-6">
|
| 105 |
+
<div class="flex items-center justify-between">
|
| 106 |
+
<div class="flex items-center space-x-3">
|
| 107 |
+
<svg xmlns="http://www.w3.org/2000/svg" class="h-10 w-10" fill="none" viewBox="0 0 24 24" stroke="currentColor">
|
| 108 |
+
<path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M9 17v-2m3 2v-4m3 4v-6m2 10H7a2 2 0 01-2-2V5a2 2 0 012-2h5.586a1 1 0 01.707.293l5.414 5.414a1 1 0 01.293.707V19a2 2 0 01-2 2z" />
|
| 109 |
+
</svg>
|
| 110 |
+
<h1 class="text-2xl font-bold">DocuCompare AI</h1>
|
| 111 |
+
</div>
|
| 112 |
+
<div class="hidden md:flex items-center space-x-4">
|
| 113 |
+
<span class="text-white text-sm bg-white bg-opacity-20 px-3 py-1 rounded-full">Qwen 3 on HF</span>
|
| 114 |
+
<a href="#" class="text-white hover:text-gray-200 transition">API Key</a>
|
| 115 |
+
</div>
|
| 116 |
+
</div>
|
| 117 |
+
</div>
|
| 118 |
+
</header>
|
| 119 |
+
|
| 120 |
+
<!-- Main Content -->
|
| 121 |
+
<main class="flex-grow container mx-auto px-4 py-8">
|
| 122 |
+
<div class="max-w-5xl mx-auto">
|
| 123 |
+
<!-- API Key Input -->
|
| 124 |
+
<div class="bg-white rounded-xl shadow-md overflow-hidden mb-8">
|
| 125 |
+
<div class="p-6">
|
| 126 |
+
<h3 class="text-xl font-semibold text-slate-800 mb-2">Hugging Face API Token</h3>
|
| 127 |
+
<div class="flex items-center space-x-2">
|
| 128 |
+
<input type="password" id="apiKeyInput" placeholder="Enter your HF API token (hf_...)" class="flex-grow px-4 py-2 border border-slate-300 rounded-md focus:outline-none focus:ring-2 focus:ring-indigo-500">
|
| 129 |
+
<button id="saveApiKey" class="px-4 py-2 bg-indigo-600 text-white rounded-md hover:bg-indigo-700 transition">Save</button>
|
| 130 |
+
</div>
|
| 131 |
+
<p class="text-sm text-slate-500 mt-2">Your token is stored locally in your browser only.</p>
|
| 132 |
+
</div>
|
| 133 |
+
</div>
|
| 134 |
+
|
| 135 |
+
<!-- Upload Section -->
|
| 136 |
+
<div class="bg-white rounded-xl shadow-md overflow-hidden mb-8">
|
| 137 |
+
<div class="p-6 border-b border-slate-100">
|
| 138 |
+
<h3 class="text-xl font-semibold text-slate-800">1. Upload Document Files</h3>
|
| 139 |
+
<p class="text-slate-500 mt-1">Upload 2 or more documents (PDF/TXT/DOCX) for Qwen 3 to compare</p>
|
| 140 |
+
</div>
|
| 141 |
+
|
| 142 |
+
<div class="p-6">
|
| 143 |
+
<div id="dropzone" class="dropzone rounded-lg p-8 text-center cursor-pointer">
|
| 144 |
+
<div class="flex flex-col items-center justify-center space-y-3">
|
| 145 |
+
<div class="p-4 bg-indigo-50 rounded-full">
|
| 146 |
+
<i data-feather="upload-cloud" class="w-8 h-8 text-indigo-500"></i>
|
| 147 |
+
</div>
|
| 148 |
+
<h4 class="font-medium text-slate-700">Drag & drop your files here</h4>
|
| 149 |
+
<p class="text-sm text-slate-500">or click to browse files</p>
|
| 150 |
+
<input type="file" id="fileInput" class="hidden" accept=".pdf,.txt,.docx" multiple>
|
| 151 |
+
<button id="browseBtn" class="mt-4 px-4 py-2 bg-indigo-600 text-white rounded-md hover:bg-indigo-700 transition">
|
| 152 |
+
Select Files
|
| 153 |
+
</button>
|
| 154 |
+
</div>
|
| 155 |
+
</div>
|
| 156 |
+
|
| 157 |
+
<div id="selectedFiles" class="mt-4 hidden">
|
| 158 |
+
<h5 class="font-medium text-slate-700 mb-2">Selected Files:</h5>
|
| 159 |
+
<ul id="fileList" class="space-y-2"></ul>
|
| 160 |
+
</div>
|
| 161 |
+
</div>
|
| 162 |
+
</div>
|
| 163 |
+
|
| 164 |
+
<!-- Processing Section -->
|
| 165 |
+
<div class="bg-white rounded-xl shadow-md overflow-hidden hidden" id="processingSection">
|
| 166 |
+
<div class="p-6 border-b border-slate-100">
|
| 167 |
+
<h3 class="text-xl font-semibold text-slate-800">2. Processing Documents with Qwen 3</h3>
|
| 168 |
+
<p class="text-slate-500 mt-1">Qwen 3 235B is analyzing your documents</p>
|
| 169 |
+
</div>
|
| 170 |
+
|
| 171 |
+
<div class="p-6">
|
| 172 |
+
<div class="mb-6">
|
| 173 |
+
<div class="flex justify-between items-center mb-1">
|
| 174 |
+
<span class="text-sm font-medium text-slate-700">Progress</span>
|
| 175 |
+
<span class="text-sm font-medium text-slate-500" id="progressText">0%</span>
|
| 176 |
+
</div>
|
| 177 |
+
<div class="progress-bar">
|
| 178 |
+
<div id="progressFill" class="progress-fill" style="width: 0%"></div>
|
| 179 |
+
</div>
|
| 180 |
+
</div>
|
| 181 |
+
|
| 182 |
+
<div id="statusLog" class="bg-slate-50 p-4 rounded-lg max-h-40 overflow-y-auto text-sm text-slate-600">
|
| 183 |
+
<div class="status-item flex items-start space-x-2 py-1">
|
| 184 |
+
<i data-feather="info" class="w-4 h-4 text-blue-500 mt-0.5"></i>
|
| 185 |
+
<span>Waiting to start processing...</span>
|
| 186 |
+
</div>
|
| 187 |
+
</div>
|
| 188 |
+
</div>
|
| 189 |
+
</div>
|
| 190 |
+
|
| 191 |
+
<!-- Results Section -->
|
| 192 |
+
<div class="hidden" id="resultsSection">
|
| 193 |
+
<h3 class="text-xl font-semibold text-slate-800 mb-4">3. Qwen 3 Analysis Results</h3>
|
| 194 |
+
|
| 195 |
+
<div class="grid md:grid-cols-2 gap-6 mb-6">
|
| 196 |
+
<div class="result-card bg-white rounded-lg p-5 border border-slate-200 flex flex-col items-center text-center">
|
| 197 |
+
<div class="p-3 bg-green-50 rounded-full mb-3">
|
| 198 |
+
<i data-feather="git-compare" class="w-6 h-6 text-green-500"></i>
|
| 199 |
+
</div>
|
| 200 |
+
<h4 class="font-medium text-slate-800 mb-1">Comparison Report</h4>
|
| 201 |
+
<p class="text-sm text-slate-500 mb-3">Detailed differences between documents</p>
|
| 202 |
+
<button id="showComparisonBtn" class="px-3 py-1 text-sm bg-green-600 text-white rounded hover:bg-green-700 transition">
|
| 203 |
+
View
|
| 204 |
+
</button>
|
| 205 |
+
</div>
|
| 206 |
+
|
| 207 |
+
<div class="result-card bg-white rounded-lg p-5 border border-slate-200 flex flex-col items-center text-center">
|
| 208 |
+
<div class="p-3 bg-purple-50 rounded-full mb-3">
|
| 209 |
+
<i data-feather="check-circle" class="w-6 h-6 text-purple-500"></i>
|
| 210 |
+
</div>
|
| 211 |
+
<h4 class="font-medium text-slate-800 mb-1">Validation Insights</h4>
|
| 212 |
+
<p class="text-sm text-slate-500 mb-3">Accuracy and validation analysis</p>
|
| 213 |
+
<button id="showValidationBtn" class="px-3 py-1 text-sm bg-purple-600 text-white rounded hover:bg-purple-700 transition">
|
| 214 |
+
View
|
| 215 |
+
</button>
|
| 216 |
+
</div>
|
| 217 |
+
</div>
|
| 218 |
+
|
| 219 |
+
<div class="bg-white rounded-xl shadow-md overflow-hidden mb-8">
|
| 220 |
+
<div id="comparisonResults" class="hidden p-6">
|
| 221 |
+
<div class="flex justify-between items-center mb-4">
|
| 222 |
+
<h4 class="text-lg font-semibold text-slate-800">Qwen 3 Comparison Report</h4>
|
| 223 |
+
<div class="flex space-x-2">
|
| 224 |
+
<button id="copyComparisonBtn" class="flex items-center space-x-1 px-3 py-1 text-sm bg-slate-100 text-slate-600 rounded hover:bg-slate-200 transition">
|
| 225 |
+
<i data-feather="copy" class="w-4 h-4"></i>
|
| 226 |
+
<span>Copy</span>
|
| 227 |
+
</button>
|
| 228 |
+
<button id="downloadComparisonBtn" class="flex items-center space-x-1 px-3 py-1 text-sm bg-slate-100 text-slate-600 rounded hover:bg-slate-200 transition">
|
| 229 |
+
<i data-feather="download" class="w-4 h-4"></i>
|
| 230 |
+
<span>Download</span>
|
| 231 |
+
</button>
|
| 232 |
+
</div>
|
| 233 |
+
</div>
|
| 234 |
+
<div id="comparisonOutput" class="whitespace-pre-wrap text-sm text-slate-700"></div>
|
| 235 |
+
</div>
|
| 236 |
+
|
| 237 |
+
<div id="validationResults" class="hidden p-6">
|
| 238 |
+
<div class="flex justify-between items-center mb-4">
|
| 239 |
+
<h4 class="text-lg font-semibold text-slate-800">Qwen 3 Validation Insights</h4>
|
| 240 |
+
<div class="flex space-x-2">
|
| 241 |
+
<button id="copyValidationBtn" class="flex items-center space-x-1 px-3 py-1 text-sm bg-slate-100 text-slate-600 rounded hover:bg-slate-200 transition">
|
| 242 |
+
<i data-feather="copy" class="w-4 h-4"></i>
|
| 243 |
+
<span>Copy</span>
|
| 244 |
+
</button>
|
| 245 |
+
<button id="downloadValidationBtn" class="flex items-center space-x-1 px-3 py-1 text-sm bg-slate-100 text-slate-600 rounded hover:bg-slate-200 transition">
|
| 246 |
+
<i data-feather="download" class="w-4 h-4"></i>
|
| 247 |
+
<span>Download</span>
|
| 248 |
+
</button>
|
| 249 |
+
</div>
|
| 250 |
+
</div>
|
| 251 |
+
<div id="validationOutput" class="whitespace-pre-wrap text-sm text-slate-700"></div>
|
| 252 |
+
</div>
|
| 253 |
+
</div>
|
| 254 |
+
</div>
|
| 255 |
+
</div>
|
| 256 |
+
</main>
|
| 257 |
+
|
| 258 |
+
<!-- Footer -->
|
| 259 |
+
<footer class="bg-white border-t border-slate-200 py-6">
|
| 260 |
+
<div class="container mx-auto px-4">
|
| 261 |
+
<div class="flex flex-col md:flex-row justify-between items-center">
|
| 262 |
+
<div class="flex items-center space-x-2 mb-4 md:mb-0">
|
| 263 |
+
<svg xmlns="http://www.w3.org/2000/svg" class="h-6 w-6 text-indigo-600" fill="none" viewBox="0 0 24 24" stroke="currentColor">
|
| 264 |
+
<path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M9 17v-2m3 2v-4m3 4v-6m2 10H7a2 2 0 01-2-2V5a2 2 0 012-2h5.586a1 1 0 01.707.293l5.414 5.414a1 1 0 01.293.707V19a2 2 0 01-2 2z" />
|
| 265 |
+
</svg>
|
| 266 |
+
<span class="font-medium text-slate-800">DocuCompare AI</span>
|
| 267 |
+
</div>
|
| 268 |
+
<div class="text-sm text-slate-500 text-center md:text-right">
|
| 269 |
+
<p>Powered by Qwen 3 via Hugging Face Endpoints</p>
|
| 270 |
+
</div>
|
| 271 |
+
</div>
|
| 272 |
+
</div>
|
| 273 |
+
</footer>
|
| 274 |
+
</div>
|
| 275 |
+
|
| 276 |
+
<script>
|
| 277 |
+
// Initialize feather icons
|
| 278 |
+
document.addEventListener('DOMContentLoaded', function() {
|
| 279 |
+
feather.replace();
|
| 280 |
+
|
| 281 |
+
// System prompts for Qwen 3
|
| 282 |
+
const EXTRACTION_PROMPT = `You are a professional document analyst. Extract the following from the provided documents:
|
| 283 |
+
1. Document metadata (parties, dates, references)
|
| 284 |
+
2. Quantitative data (line items, amounts, dates)
|
| 285 |
+
3. Key terms and conditions
|
| 286 |
+
4. Any special provisions`;
|
| 287 |
+
|
| 288 |
+
const COMPARISON_PROMPT = `Compare these document extracts and highlight:
|
| 289 |
+
1. Major differences in terms and pricing
|
| 290 |
+
2. Variances in scope/specifications
|
| 291 |
+
3. Advantages/disadvantages of each document
|
| 292 |
+
4. Any red flags or concerns
|
| 293 |
+
|
| 294 |
+
Format as detailed markdown with tables where appropriate`;
|
| 295 |
+
|
| 296 |
+
const VALIDATION_PROMPT = `Validate this comparison report:
|
| 297 |
+
1. Check for accuracy against original docs
|
| 298 |
+
2. Identify missing comparisons
|
| 299 |
+
3. Note any possible misinterpretations
|
| 300 |
+
4. Suggest additional areas that need review`;
|
| 301 |
+
|
| 302 |
+
// DOM Elements
|
| 303 |
+
const apiKeyInput = document.getElementById('apiKeyInput');
|
| 304 |
+
const saveApiKey = document.getElementById('saveApiKey');
|
| 305 |
+
const dropzone = document.getElementById('dropzone');
|
| 306 |
+
const fileInput = document.getElementById('fileInput');
|
| 307 |
+
const browseBtn = document.getElementById('browseBtn');
|
| 308 |
+
const selectedFiles = document.getElementById('selectedFiles');
|
| 309 |
+
const fileList = document.getElementById('fileList');
|
| 310 |
+
const processingSection = document.getElementById('processingSection');
|
| 311 |
+
const resultsSection = document.getElementById('resultsSection');
|
| 312 |
+
const progressText = document.getElementById('progressText');
|
| 313 |
+
const progressFill = document.getElementById('progressFill');
|
| 314 |
+
const statusLog = document.getElementById('statusLog');
|
| 315 |
+
|
| 316 |
+
const comparisonResults = document.getElementById('comparisonResults');
|
| 317 |
+
const validationResults = document.getElementById('validationResults');
|
| 318 |
+
const comparisonOutput = document.getElementById('comparisonOutput');
|
| 319 |
+
const validationOutput = document.getElementById('validationOutput');
|
| 320 |
+
|
| 321 |
+
// Check for saved API key
|
| 322 |
+
const savedApiKey = localStorage.getItem('hfApiToken');
|
| 323 |
+
if (savedApiKey) {
|
| 324 |
+
apiKeyInput.value = savedApiKey;
|
| 325 |
+
}
|
| 326 |
+
|
| 327 |
+
// Event Listeners
|
| 328 |
+
saveApiKey.addEventListener('click', () => {
|
| 329 |
+
const key = apiKeyInput.value.trim();
|
| 330 |
+
if (key && key.startsWith('hf_')) {
|
| 331 |
+
localStorage.setItem('hfApiToken', key);
|
| 332 |
+
addStatusLog("HF API token saved locally in your browser.", "success");
|
| 333 |
+
} else {
|
| 334 |
+
addStatusLog("Please enter a valid HF token (starting with hf_)", "error");
|
| 335 |
+
apiKeyInput.focus();
|
| 336 |
+
}
|
| 337 |
+
});
|
| 338 |
+
|
| 339 |
+
browseBtn.addEventListener('click', () => fileInput.click());
|
| 340 |
+
|
| 341 |
+
fileInput.addEventListener('change', handleFileSelect);
|
| 342 |
+
|
| 343 |
+
// Drag and drop handlers
|
| 344 |
+
dropzone.addEventListener('dragover', (e) => {
|
| 345 |
+
e.preventDefault();
|
| 346 |
+
dropzone.classList.add('active');
|
| 347 |
+
});
|
| 348 |
+
|
| 349 |
+
dropzone.addEventListener('dragleave', () => {
|
| 350 |
+
dropzone.classList.remove('active');
|
| 351 |
+
});
|
| 352 |
+
|
| 353 |
+
dropzone.addEventListener('drop', (e) => {
|
| 354 |
+
e.preventDefault();
|
| 355 |
+
dropzone.classList.remove('active');
|
| 356 |
+
fileInput.files = e.dataTransfer.files;
|
| 357 |
+
handleFileSelect();
|
| 358 |
+
});
|
| 359 |
+
|
| 360 |
+
// Result navigation buttons
|
| 361 |
+
document.getElementById('showComparisonBtn').addEventListener('click', () => {
|
| 362 |
+
comparisonResults.classList.remove('hidden');
|
| 363 |
+
validationResults.classList.add('hidden');
|
| 364 |
+
});
|
| 365 |
+
|
| 366 |
+
document.getElementById('showValidationBtn').addEventListener('click', () => {
|
| 367 |
+
comparisonResults.classList.add('hidden');
|
| 368 |
+
validationResults.classList.remove('hidden');
|
| 369 |
+
});
|
| 370 |
+
|
| 371 |
+
// Copy/download buttons
|
| 372 |
+
document.getElementById('copyComparisonBtn').addEventListener('click', copyComparison);
|
| 373 |
+
document.getElementById('copyValidationBtn').addEventListener('click', copyValidation);
|
| 374 |
+
document.getElementById('downloadComparisonBtn').addEventListener('click', downloadComparison);
|
| 375 |
+
document.getElementById('downloadValidationBtn').addEventListener('click', downloadValidation);
|
| 376 |
+
|
| 377 |
+
// Functions
|
| 378 |
+
function handleFileSelect() {
|
| 379 |
+
const files = fileInput.files;
|
| 380 |
+
if (files.length < 2) {
|
| 381 |
+
addStatusLog('Please select at least 2 files for comparison.', 'error');
|
| 382 |
+
return;
|
| 383 |
+
}
|
| 384 |
+
|
| 385 |
+
// Show selected files
|
| 386 |
+
fileList.innerHTML = '';
|
| 387 |
+
Array.from(files).forEach(file => {
|
| 388 |
+
const li = document.createElement('li');
|
| 389 |
+
li.className = 'flex items-center justify-between bg-slate-50 px-3 py-2 rounded';
|
| 390 |
+
li.innerHTML = `
|
| 391 |
+
<div class="flex items-center space-x-3">
|
| 392 |
+
<i data-feather="file" class="w-4 h-4 text-slate-500"></i>
|
| 393 |
+
<span class="text-sm text-slate-700 truncate max-w-xs">${file.name}</span>
|
| 394 |
+
</div>
|
| 395 |
+
<span class="text-xs text-slate-500">${formatFileSize(file.size)}</span>
|
| 396 |
+
`;
|
| 397 |
+
fileList.appendChild(li);
|
| 398 |
+
});
|
| 399 |
+
|
| 400 |
+
selectedFiles.classList.remove('hidden');
|
| 401 |
+
|
| 402 |
+
// Start processing after a slight delay for UI to update
|
| 403 |
+
setTimeout(startProcessing, 1000);
|
| 404 |
+
|
| 405 |
+
feather.replace();
|
| 406 |
+
}
|
| 407 |
+
|
| 408 |
+
function formatFileSize(bytes) {
|
| 409 |
+
if (bytes === 0) return '0 Bytes';
|
| 410 |
+
const k = 1024;
|
| 411 |
+
const sizes = ['Bytes', 'KB', 'MB', 'GB'];
|
| 412 |
+
const i = Math.floor(Math.log(bytes) / Math.log(k));
|
| 413 |
+
return parseFloat((bytes / Math.pow(k, i)).toFixed(2)) + ' ' + sizes[i];
|
| 414 |
+
}
|
| 415 |
+
|
| 416 |
+
function startProcessing() {
|
| 417 |
+
const apiKey = localStorage.getItem('hfApiToken');
|
| 418 |
+
if (!apiKey) {
|
| 419 |
+
addStatusLog('Please save your Hugging Face API token first.', 'error');
|
| 420 |
+
return;
|
| 421 |
+
}
|
| 422 |
+
|
| 423 |
+
processingSection.classList.remove('hidden');
|
| 424 |
+
addStatusLog('Starting document processing with Qwen 3...');
|
| 425 |
+
|
| 426 |
+
// Simulate processing with progress updates
|
| 427 |
+
simulateProcessing();
|
| 428 |
+
}
|
| 429 |
+
|
| 430 |
+
function simulateHFQwen3Call(prompt, content, callback) {
|
| 431 |
+
// In a real implementation, this would call the HF Inference API:
|
| 432 |
+
/*
|
| 433 |
+
fetch('https://router.huggingface.co/fireworks-ai/inference/v1/chat/completions', {
|
| 434 |
+
method: 'POST',
|
| 435 |
+
headers: {
|
| 436 |
+
'Authorization': `Bearer ${apiKey}`,
|
| 437 |
+
'Content-Type': 'application/json'
|
| 438 |
+
},
|
| 439 |
+
body: JSON.stringify({
|
| 440 |
+
messages: [{
|
| 441 |
+
role: "user",
|
| 442 |
+
content: `${prompt}\n\n${content}`
|
| 443 |
+
}],
|
| 444 |
+
model: "accounts/fireworks/models/qwen3-235b-a22b",
|
| 445 |
+
stream: false
|
| 446 |
+
})
|
| 447 |
+
})
|
| 448 |
+
.then(response => response.json())
|
| 449 |
+
.then(data => callback(data.choices[0].message.content))
|
| 450 |
+
.catch(error => callback(`Error: ${error.message}`));
|
| 451 |
+
*/
|
| 452 |
+
|
| 453 |
+
// For demo purposes, simulate the API response
|
| 454 |
+
setTimeout(() => {
|
| 455 |
+
if (prompt === COMPARISON_PROMPT) {
|
| 456 |
+
callback(`## Comparison Report (Simulated)
|
| 457 |
+
|
| 458 |
+
### Key Findings
|
| 459 |
+
1. **Price Variation**: Document A shows consistently higher pricing (avg +18%) than Document B
|
| 460 |
+
2. **Scope Differences**: Document B includes additional services (maintenance, support) not in A
|
| 461 |
+
3. **Timeline**: Document A proposes 6 month timeline vs Document B's 4 month estimate
|
| 462 |
+
|
| 463 |
+
| Category | Document A | Document B | Difference |
|
| 464 |
+
|----------------|------------|------------|------------|
|
| 465 |
+
| Development | $28,500 | $24,000 | +18.75% |
|
| 466 |
+
| Testing | $9,200 | $7,500 | +22.67% |
|
| 467 |
+
| Maintenance | - | $3,000 | N/A |
|
| 468 |
+
|
| 469 |
+
**Recommendation**: Consider negotiating with Provider A to match Provider B's pricing or switch to Provider B for lower costs and additional services.`);
|
| 470 |
+
} else if (prompt === VALIDATION_PROMPT) {
|
| 471 |
+
callback(`## Validation Report (Simulated)
|
| 472 |
+
|
| 473 |
+
### Verification Points
|
| 474 |
+
✅ Pricing differences confirmed across all line items
|
| 475 |
+
✅ Document B does indeed include maintenance services
|
| 476 |
+
⚠️ Timeline estimates should be validated with both providers
|
| 477 |
+
|
| 478 |
+
### Additional Findings
|
| 479 |
+
1. Payment terms weren't compared (50% upfront in both)
|
| 480 |
+
2. Penalty clauses differ (5% vs 10% for delays)
|
| 481 |
+
3. Document A includes better IP protections
|
| 482 |
+
|
| 483 |
+
**Action Items**:
|
| 484 |
+
1. Verify actual delivery capacity for timeline claims
|
| 485 |
+
2. Compare quality guarantees between providers`);
|
| 486 |
+
} else {
|
| 487 |
+
callback(`Error: Unsupported prompt type`);
|
| 488 |
+
}
|
| 489 |
+
}, 2000);
|
| 490 |
+
}
|
| 491 |
+
|
| 492 |
+
function simulateProcessing() {
|
| 493 |
+
let progress = 0;
|
| 494 |
+
const steps = [
|
| 495 |
+
{text: "Uploading documents...", increment: 15},
|
| 496 |
+
{text: "Extracting content from files...", increment: 25},
|
| 497 |
+
{text: "Sending to Qwen 3 via HF endpoint...", increment: 30},
|
| 498 |
+
{text: "Analyzing document differences...", increment: 15},
|
| 499 |
+
{text: "Generating validation report...", increment: 10},
|
| 500 |
+
{text: "Finalizing results...", increment: 5}
|
| 501 |
+
];
|
| 502 |
+
|
| 503 |
+
let currentStep = 0;
|
| 504 |
+
|
| 505 |
+
const processInterval = setInterval(() => {
|
| 506 |
+
if (currentStep < steps.length) {
|
| 507 |
+
const step = steps[currentStep];
|
| 508 |
+
|
| 509 |
+
// Update progress
|
| 510 |
+
progress = Math.min(progress + step.increment, 100);
|
| 511 |
+
progressText.textContent = `${progress}%`;
|
| 512 |
+
progressFill.style.width = `${progress}%`;
|
| 513 |
+
|
| 514 |
+
// Add status log entry
|
| 515 |
+
addStatusLog(step.text);
|
| 516 |
+
|
| 517 |
+
// Simulate API calls during processing
|
| 518 |
+
if (currentStep === 2) {
|
| 519 |
+
const mockContent = `
|
| 520 |
+
Document A: Vendor Alpha Proposal
|
| 521 |
+
- Development: $28,500
|
| 522 |
+
- Testing: $9,200
|
| 523 |
+
- Timeline: 6 months
|
| 524 |
+
- Payment: 50% upfront
|
| 525 |
+
|
| 526 |
+
Document B: Supplier Beta Offer
|
| 527 |
+
- Development: $24,000
|
| 528 |
+
- Testing: $7,500
|
| 529 |
+
- Maintenance: $3,000
|
| 530 |
+
- Timeline: 4 months
|
| 531 |
+
- Payment: 50% upfront`;
|
| 532 |
+
|
| 533 |
+
simulateHFQwen3Call(COMPARISON_PROMPT, mockContent, (result) => {
|
| 534 |
+
comparisonOutput.textContent = result;
|
| 535 |
+
addStatusLog('Comparison report generated successfully', 'success');
|
| 536 |
+
});
|
| 537 |
+
|
| 538 |
+
simulateHFQwen3Call(VALIDATION_PROMPT, mockContent, (result) => {
|
| 539 |
+
validationOutput.textContent = result;
|
| 540 |
+
addStatusLog('Validation analysis complete', 'success');
|
| 541 |
+
});
|
| 542 |
+
}
|
| 543 |
+
|
| 544 |
+
currentStep++;
|
| 545 |
+
} else {
|
| 546 |
+
clearInterval(processInterval);
|
| 547 |
+
|
| 548 |
+
// Add completion status
|
| 549 |
+
addStatusLog("Analysis complete! View results below.", "success");
|
| 550 |
+
|
| 551 |
+
// Show results after a delay
|
| 552 |
+
setTimeout(showResults, 1000);
|
| 553 |
+
}
|
| 554 |
+
}, 1500);
|
| 555 |
+
}
|
| 556 |
+
|
| 557 |
+
function addStatusLog(message, type = "info") {
|
| 558 |
+
const statusItem = document.createElement('div');
|
| 559 |
+
statusItem.className = 'status-item flex items-start space-x-2 py-1';
|
| 560 |
+
|
| 561 |
+
let icon;
|
| 562 |
+
let textColor;
|
| 563 |
+
|
| 564 |
+
switch(type) {
|
| 565 |
+
case "success":
|
| 566 |
+
icon = 'check-circle';
|
| 567 |
+
textColor = 'text-green-500';
|
| 568 |
+
break;
|
| 569 |
+
case "error":
|
| 570 |
+
icon = 'alert-circle';
|
| 571 |
+
textColor = 'text-red-500';
|
| 572 |
+
break;
|
| 573 |
+
case "warning":
|
| 574 |
+
icon = 'alert-triangle';
|
| 575 |
+
textColor = 'text-yellow-500';
|
| 576 |
+
break;
|
| 577 |
+
default:
|
| 578 |
+
icon = 'info';
|
| 579 |
+
textColor = 'text-blue-500';
|
| 580 |
+
}
|
| 581 |
+
|
| 582 |
+
statusItem.innerHTML = `
|
| 583 |
+
<i data-feather="${icon}" class="w-4 h-4 ${textColor} mt-0.5"></i>
|
| 584 |
+
<span>${message}</span>
|
| 585 |
+
`;
|
| 586 |
+
|
| 587 |
+
statusLog.appendChild(statusItem);
|
| 588 |
+
statusLog.scrollTop = statusLog.scrollHeight;
|
| 589 |
+
|
| 590 |
+
feather.replace();
|
| 591 |
+
}
|
| 592 |
+
|
| 593 |
+
function showResults() {
|
| 594 |
+
processingSection.classList.add('hidden');
|
| 595 |
+
resultsSection.classList.remove('hidden');
|
| 596 |
+
|
| 597 |
+
// Show comparison by default
|
| 598 |
+
comparisonResults.classList.remove('hidden');
|
| 599 |
+
validationResults.classList.add('hidden');
|
| 600 |
+
}
|
| 601 |
+
|
| 602 |
+
// Clipboard functions
|
| 603 |
+
function copyComparison() {
|
| 604 |
+
navigator.clipboard.writeText(comparisonOutput.textContent);
|
| 605 |
+
showCopyFeedback('copyComparisonBtn');
|
| 606 |
+
}
|
| 607 |
+
|
| 608 |
+
function copyValidation() {
|
| 609 |
+
navigator.clipboard.writeText(validationOutput.textContent);
|
| 610 |
+
showCopyFeedback('copyValidationBtn');
|
| 611 |
+
}
|
| 612 |
+
|
| 613 |
+
function showCopyFeedback(buttonId) {
|
| 614 |
+
const button = document.getElementById(buttonId);
|
| 615 |
+
const icon = button.querySelector('i');
|
| 616 |
+
const text = button.querySelector('span');
|
| 617 |
+
|
| 618 |
+
// Change to indicate success
|
| 619 |
+
icon.setAttribute('data-feather', 'check');
|
| 620 |
+
feather.replace();
|
| 621 |
+
text.textContent = 'Copied';
|
| 622 |
+
|
| 623 |
+
// Reset after 2 seconds
|
| 624 |
+
setTimeout(() => {
|
| 625 |
+
icon.setAttribute('data-feather', 'copy');
|
| 626 |
+
feather.replace();
|
| 627 |
+
text.textContent = 'Copy';
|
| 628 |
+
}, 2000);
|
| 629 |
+
}
|
| 630 |
+
|
| 631 |
+
// Download functions
|
| 632 |
+
function downloadComparison() {
|
| 633 |
+
downloadFile('Qwen3_Comparison_Report.md', comparisonOutput.textContent);
|
| 634 |
+
}
|
| 635 |
+
|
| 636 |
+
function downloadValidation() {
|
| 637 |
+
downloadFile('Qwen3_Validation_Report.md', validationOutput.textContent);
|
| 638 |
+
}
|
| 639 |
+
|
| 640 |
+
function downloadFile(filename, content) {
|
| 641 |
+
const blob = new Blob([content], {type: 'text/markdown'});
|
| 642 |
+
const url = URL.createObjectURL(blob);
|
| 643 |
+
const a = document.createElement('a');
|
| 644 |
+
a.href = url;
|
| 645 |
+
a.download = filename;
|
| 646 |
+
document.body.appendChild(a);
|
| 647 |
+
a.click();
|
| 648 |
+
document.body.removeChild(a);
|
| 649 |
+
URL.revokeObjectURL(url);
|
| 650 |
+
}
|
| 651 |
+
});
|
| 652 |
+
</script>
|
| 653 |
+
<p style="border-radius: 8px; text-align: center; font-size: 12px; color: #fff; margin-top: 16px;position: fixed; left: 8px; bottom: 8px; z-index: 10; background: rgba(0, 0, 0, 0.8); padding: 4px 8px;">Made with <img src="https://enzostvs-deepsite.hf.space/logo.svg" alt="DeepSite Logo" style="width: 16px; height: 16px; vertical-align: middle;display:inline-block;margin-right:3px;filter:brightness(0) invert(1);"><a href="https://enzostvs-deepsite.hf.space" style="color: #fff;text-decoration: underline;" target="_blank" >DeepSite</a> - 🧬 <a href="https://enzostvs-deepsite.hf.space?remix=Ultronprime/doccompare" style="color: #fff;text-decoration: underline;" target="_blank" >Remix</a></p></body>
|
| 654 |
+
</html>
|
prompts.txt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
Okay, I will construct a single Python file (app.py) for a Gradio application that performs the requested multi-step PDF comparison and validation process. This application will be suitable for deployment on Google Cloud Run. Key Features: PDF Upload: Users can upload multiple PDF files. Extraction (PDF to JSON): Each PDF will be processed by gemini-2.5-flash-preview-04-17 to extract structured data into JSON format using the provided "Document Extraction Specialist" system instruction. Comparison (JSONs to Report): The generated JSONs will be compared by gemini-2.5-pro-preview-05-06 to produce a Markdown report using the "Quotation Comparison Analyst" system instruction. Validation (Report + PDFs to Validation): The comparison report and the original PDFs will be processed by gemini-2.5-pro-preview-05-06 to generate a validation report, using a new "Validation Specialist" system instruction. Before you run this code: Set Environment Variables or Hardcode: You MUST set GCP_PROJECT_ID and GCP_LOCATION. The code includes placeholders. For Cloud Run, setting these as environment variables is recommended. Permissions: The service account running this Cloud Run application will need the "Vertex AI User" role (or more specific permissions for aiplatform.googleapis.com/projects.locations.publishers.models:predict) on your GCP project. Dependencies: You'll need to install google-cloud-aiplatform, google-generativeai (though vertexai SDK is primary), pypdf, and gradio. A requirements.txt will be provided. Here's the app.py file: # app.py import os import json import time import logging import asyncio import io import traceback import gradio as gr # --- Vertex AI Initialization --- # IMPORTANT: Set these for your environment! # For Cloud Run, prefer to set these as environment variables. GCP_PROJECT_ID = os.environ.get("GCP_PROJECT_ID", "YOUR_GCP_PROJECT_ID") GCP_LOCATION = os.environ.get("GCP_LOCATION", "YOUR_GCP_LOCATION") if GCP_PROJECT_ID == "YOUR_GCP_PROJECT_ID" or GCP_LOCATION == "YOUR_GCP_LOCATION": print("WARNING: GCP_PROJECT_ID or GCP_LOCATION are not set. Using placeholder values.") print("Please set them as environment variables or directly in the code.") try: import vertexai from vertexai.generative_models import GenerativeModel, Part, GenerationConfig, SafetySetting, HarmCategory from google.api_core import exceptions as api_exceptions import google.auth vertexai.init(project=GCP_PROJECT_ID, location=GCP_LOCATION) google_libs_imported = True print(f"Vertex AI SDK Initialized for project={GCP_PROJECT_ID}, location={GCP_LOCATION}") except ImportError as e: print(f"ERROR: Failed to import Google libraries: {e}. Ensure google-cloud-aiplatform is installed.") google_libs_imported = False except google.auth.exceptions.DefaultCredentialsError as e: print(f"ERROR: Could not find default credentials for Vertex AI: {e}") print("Ensure your environment is authenticated (e.g., gcloud auth application-default login locally, or service account on Cloud Run).") google_libs_imported = False except Exception as e: print(f"Error initializing Vertex AI SDK: {e}", exc_info=True) google_libs_imported = False # PDF Library try: from pypdf import PdfReader # No need for PdfWriter in this Gradio app for now pypdf_imported = True print("pypdf imported successfully.") except ImportError as e: print(f"ERROR: Failed to import pypdf: {e}. Ensure pypdf is installed.") pypdf_imported = False # --- Model Configuration --- MODEL_PDF_EXTRACTION = "gemini-2.5-flash-preview-04-17" MODEL_COMPARISON = "gemini-2.5-pro-preview-05-06" # As used in notebook's comparison cell MODEL_VALIDATION = "gemini-2.5-pro-preview-05-06" TEMPERATURE = 0.2 MAX_OUTPUT_TOKENS_EXTRACTION = 8192 # Flash has smaller token limits for output MAX_OUTPUT_TOKENS_REPORTING = 65535 # For Pro models generating reports # --- System Instructions --- SI_PDF_EXTRACTION = """# Document Extraction Specialist You are an expert Document Extraction Specialist with advanced capabilities for extracting project scope details, LPO (Local Purchase Order) information, quotation details, and quantitative data from construction and project documents. Your primary function is to analyze documents with exceptional thoroughness and precision, extracting key document metadata, scope terms, and detailed Bill of Quantities (BOQ) data, including calculated and potentially discounted rates. ## Core Function Your purpose is to extract: 1. **Document Metadata**: Including LPO parties, references, dates, and quotation information (from both LPO and Quotation documents if available). 2. **Detailed Scope Information**: Including objectives, deliverables, activities, inclusions/exclusions, etc., primarily from the LPO or governing contract document. 3. **Detailed Quantitative Line Item Data**: **Prioritizing the linked Quotation document** for item breakdowns, calculating missing rates, identifying overall discounts by comparing totals, and calculating discounted rates per item. ## Extraction Methodology ### For Document Metadata (LPO and Quotation Information) 1. **Identify Documents**: Recognize both LPO and Quotation documents, often linked by reference numbers. 2. **Extract Key Fields**: For both LPO and Quotation (where applicable), extract: * First Party (Client/Purchaser, typically consistent) * Second Party (Vendor/Supplier, typically consistent) * Document Reference (LPO Ref, Quotation Ref) * Document Date (LPO Date, Quotation Date) 3. **Store Appropriately**: Populate the `document_metadata` fields. Use LPO details as the primary source if fields conflict, but capture both reference numbers and dates. ### For Scope Information (Text-Based Content) 1. **Source Priority**: Primarily extract scope details (objectives, deliverables, activities, inclusions, exclusions, assumptions, success criteria) from the **LPO** or the main contractual document, as this usually reflects the final agreed scope. 2. **Comprehensive Analysis & Extraction**: Follow the original methodology for extracting semantic scope elements. ### For BOQ/Quantitative Information (Structured Data - **PRIORITIZE QUOTATION**) 1. **Link Documents**: Identify the Quotation document referenced in the LPO (using `quotation_reference`). 2. **Prioritize Quotation for Line Items**: Extract detailed line items (description, quantity, unit, rate, amount) **directly from the referenced Quotation document**. The LPO might only contain a lump sum which shouldn't be broken down unless the LPO *itself* contains the breakdown. 3. **Extract Quotation Line Item Data**: For each line item identified *in the Quotation*: * **Subcategory**: Section/heading if present. * **Item Description**: Full text. * **Quantity**: Numerical quantity. * **Amount**: Total amount for the line item *as stated in the Quotation*. * **Rate (Explicit)**: Extract the rate *only if it is explicitly stated* in the Quotation's rate column for that item. Store this in the `rate` field. If missing, leave `rate` as null initially. 4. **Calculate Missing Rates**: If the `rate` field is null (was missing in the Quotation) BUT `quantity` (must be non-zero) and `amount` are present for that line item *in the Quotation*, calculate the rate: `calculated_rate = amount / quantity`. Store this in the `calculated_rate` field. If rate was present or cannot be calculated, leave `calculated_rate` as null. 5. **Discount Identification**: * Identify the **Grand Total** (or a comparable sub-total, aiming for pre-VAT if possible) from the **Quotation**. * Identify the corresponding **Sub-Total** (or comparable pre-VAT total) from the **LPO** for the same scope of work. * **Compare Totals**: Check if the LPO Sub-Total is less than the Quotation Grand Total. Also, look for any handwritten notes or explicit mentions of a discount modifying the Quotation total. * **Calculate Discount Factor**: If a discount is identified (LPO total < Quotation total), calculate `discount_factor = LPO_SubTotal / Quotation_GrandTotal`. Ensure the totals used are comparable (e.g., both before VAT, or both including VAT consistently). If no discount is evident, the factor is 1.0. 6. **Calculate Discounted Rates**: * If the `discount_factor` is less than 1.0: * For each line item from the Quotation, determine the base rate to use: prioritize the explicitly stated `rate` if available; otherwise, use the `calculated_rate`. * If a base rate exists, calculate `discounted_rate = base_rate * discount_factor`. Store this in the `discounted_rate` field. * If the `discount_factor` is 1.0 or cannot be determined, leave `discounted_rate` as null for all items. ## Extraction Principles * **Accuracy & Source Priority**: Extract exactly as presented, prioritizing the Quotation for line item details and the LPO for final totals and scope statements. * **Calculation**: Perform calculations for missing rates and discounted rates only as instructed. * **Completeness**: Capture all available data points according to the schema. * **Numerical Precision**: Maintain exact numerical values. Remove currency symbols. * **Null Handling**: Use null for missing explicit data or when calculations are not applicable/possible. You will automatically analyze and extract from any document provided, correlating LPO and Quotation information as described, delivering a complete output following the specified JSON schema. The output MUST be a single valid JSON object. Do not include any explanatory text before or after the JSON object itself. Do not use markdown ```json ... ``` fences. """ SI_JSON_COMPARISON = """Quotation Comparison Analyst (Enhanced for Vertical Comparison & Deeper Insights) Role and Objective You are a highly specialized Quotation Comparison Analyst. Your expertise lies in dissecting construction, renovation, and procurement documents (quotations/BOQs) to provide insightful, actionable comparisons. Your primary task is to meticulously analyze multiple JSON-formatted quotations, transforming them into clear, vertical comparison tables. You must identify and articulate scope differences, quantitative variations, gaps, term discrepancies, and their potential impact, enabling users to make informed decisions, negotiate effectively, and mitigate risks. Input Processing & Setup 1. **Document Ingestion**: You will receive multiple JSON-formatted quotation documents. 2. **Quotation Identification**: For your comparison tables, each distinct quotation (derived from an uploaded JSON file) will form a **separate column**. Use the source filename (or a shortened, logical version of it if too long) as the header for each quotation's column (e.g., "Quote_ContractorA.json" becomes "Contractor A", "Vendor_B_Proposal.json" becomes "Vendor B"). The first column in your tables will typically be the "Item/Aspect" being compared. 3. **Intelligent Standardization**: * Proactively identify and map variations in terminology for similar construction items or activities across quotes (e.g., "Concrete Slab Grade 30" vs. "C30 Concrete Floor" vs. "Foundation Concrete Work"). State your assumed equivalencies if necessary. * Recognize if units of measure differ for comparable items and flag this, noting the units used by each quote. * Normalize structural differences in JSONs where possible to compare like-for-like items. Core Analysis Framework (Focus on Vertical Comparison & Impact) 1. **Scope Coverage & Detail:** * For each line item, activity, or deliverable, list its presence and details across all quotations in their respective columns. * Clearly mark items present in one quote but missing in others (❌ in the relevant quote's column). * If terminology differs but likely refers to the same work, use a symbol (🔄) and provide a note explaining the assumed equivalence. * Analyze and compare the stated quality, specifications, and standards of materials and workmanship for each item across quotes. * **Crucially, highlight potential scope gaps or ambiguities that could lead to change orders or unexpected costs.** 2. **Quantitative Analysis (Item by Item):** * For each comparable line item, present the quantities, units, and any dimensional data side-by-side in the respective quotation columns. * Focus on identifying and flagging significant quantity discrepancies (⚠️). * **Analyze the implications of these quantity differences.** For instance, if one quote has significantly less quantity for a critical item, state the potential risk (e.g., "Risk of underestimation, may require additional orders/cost"). 3. **Terms & Conditions Deep Dive:** * Compare key contractual terms: assumptions, exclusions, inclusions, payment schedules (upfront, milestones, retention), warranty periods, liquidated damages, insurance requirements, validity periods, etc. * For each term, present the specifics from each quotation in its column. * **Analyze and clearly state the implications of differing terms.** For example: "Quote A: 1-year warranty; Quote B: 3-year warranty. Implication: Quote B offers significantly better long-term protection, potentially reducing future repair costs." * Identify and highlight any hidden charges, unusual clauses, or terms that heavily favor one party. 4. **Comprehensiveness & Clarity Assessment:** * Evaluate the overall detail and clarity of each quotation. * In your summary or notes, comment on which quotations provide more robust and unambiguous scope descriptions. * Verify if ancillary costs (e.g., permits, site access, waste disposal, temporary facilities, professional fees) are explicitly included or excluded by each quote. 5. **Integrated Risk Assessment:** * For major discrepancies found in scope, quantity, or terms, explicitly assess and state the potential risks (e.g., "Risk of cost overrun due to exclusion of X in Quote A," "Schedule risk due to unclear mobilization timeline in Quote B"). * Categorize the severity of these risks where obvious (e.g., high, medium, low). Output Format (Vertical Tables are Key) 1. **Executive Summary:** * A concise, high-level overview of the most critical findings. * Directly state which quotation appears most comprehensive, best value (considering scope and terms, not just price), or which ones require urgent clarification/revision. * Summarize the most significant risks, opportunities, or hidden costs identified. 2. **Detailed Scope & Quantitative Comparison (Vertical Table):** * **Columns:** "Item/Aspect Description", "[Quote A Name]", "[Quote B Name]", "[Quote C Name]", ..., "Analysis & Discrepancy Notes" * **Rows:** Individual line items, sub-categories, materials, activities. * In the "[Quote X Name]" columns, show quantity, units, key specs, and status (✅, ❌, 🔄, ⚠️). * The "Analysis & Discrepancy Notes" column should provide your interpretation of differences, highlight gaps, explain symbols, and note implications. *Example Structure:* | Item/Aspect Description | Contractor Alpha | Builder Beta | Construct Co. | Analysis & Discrepancy Notes | |------------------------------|------------------|--------------|---------------|---------------------------------------------------------------------------------------------| | Concrete Foundation (m³) | 100 (Grade 25) ✅ | 95 (Grade 30) ✅ | - ❌ | Construct Co. missing. Builder Beta uses higher grade concrete but slightly less volume. | | Excavation for Foundation | Included ✅ | Included ✅ | Not Stated 🔄 | Assumed included in Construct Co. if foundation is added, but needs clarification. | | Site Clearing | 5000 m² ⚠️ | 3000 m² ✅ | 3200 m² ✅ | Contractor Alpha has significantly higher quantity for site clearing; verify scope boundaries. | | *... (other items) ...* | | | | | 3. **Detailed Terms & Conditions Comparison (Vertical Table):** * **Columns:** "Term/Condition", "[Quote A Name]", "[Quote B Name]", "[Quote C Name]", ..., "Implication & Risk Highlight" * **Rows:** Payment Terms, Warranty, Exclusions, Assumptions, Mobilization, etc. * The "Implication & Risk Highlight" column is crucial for explaining the impact of differences. *Example Structure:* | Term/Condition | Contractor Alpha | Builder Beta | Construct Co. | Implication & Risk Highlight | |-----------------------|------------------|---------------|---------------|----------------------------------------------------------------------------------------------| | Payment Terms | 30% Upfront | 20% Upfront | 50% Upfront | Construct Co.'s high upfront payment (50%) increases client financial risk and cash flow strain. | | Warranty Period | 1 Year | 2 Years | 1 Year | Builder Beta offers a better warranty, providing greater long-term assurance. | | Price Escalation Clause| Not Specified | Included (CPI)| Not Specified | Builder Beta includes price escalation; others may be fixed price or silent (risk if project delays). | | *... (other terms) ...*| | | | | 4. **Critical Issues Summary Table (Vertical):** * **Columns:** "Critical Issue/Question", "[Quote A Name]", "[Quote B Name]", "[Quote C Name]", ..., "Notes/Impact" * **Rows:** Key discrepancies, major scope gaps, high-risk terms. * Succinctly show how each quote addresses (or fails to address) critical points. *Example Structure:* | Critical Issue/Question | Contractor Alpha | Builder Beta | Construct Co. | Notes/Impact | |-------------------------------------|------------------|--------------|---------------|------------------------------------------------------------------------------| | Foundation Work Scope | Included | Included | **MISSING** | Major scope gap in Construct Co. Must be added. | | Waste Disposal Responsibility | Client | Contractor | Client | Builder Beta includes disposal; others add cost/responsibility to client. | | Liquidated Damages for Delay | 0.5%/week | Not Stated | 1%/week | Builder Beta lacks LDs (risk for client). Construct Co. has higher penalty. | | *... (other critical issues) ...* | | | | | 5. **Recommended Actions & Queries:** * Provide a clear, prioritized list of specific questions to ask each contractor. * Suggest items to request for addition/clarification in quotes. * Recommend negotiation points based on the comparison. Analysis Approach & Principles * **Prioritize Clarity and Actionability**: Your output must be easy to understand and directly support decision-making. * **Deep Reasoning**: Go beyond surface-level matching. Infer relationships, understand construction context, and cross-reference information within each quote and across quotes. * **Impact-Focused**: For every significant difference, explain "So what?". What does this mean for the user in terms of cost, time, quality, or risk? * **Objectivity**: Present an unbiased comparison. Clearly state assumptions made during analysis. * **Vigilance for Ambiguity**: Actively seek out and flag vague language, unspecified items, or potential misunderstandings. Suggest these be clarified. Limitations * You cannot make final decisions for the user. * Your analysis is based solely on the provided JSON data. You cannot infer vendor reputation or factors outside the documents. * Highly technical or novel items might require domain expert review beyond your analysis. Key Considerations for AI: * Use standard construction terminology consistently. * Group similar work items logically, even if presented differently in source JSONs. * Be meticulous in identifying scope gaps, exclusions, and assumptions that could lead to change orders or disputes. * Always aim to compare "apples to apples" as much as possible, and clearly state when this is not feasible due to data limitations. Output must be in Markdown format. """ SI_COMPARISON_VALIDATION = """# Validation Specialist for Quotation Comparison Reports You are a meticulous Validation Specialist. Your task is to critically review a previously generated "Quotation Comparison Report" (provided in Markdown format) by cross-referencing it against the original source PDF quotation documents. Your goal is to identify: 1. **Accuracy:** Are the facts, figures, quantities, terms, and conclusions stated in the comparison report accurately extracted and represented from the source PDFs? Highlight any discrepancies. Note if numbers, dates, names, or specific clauses are correctly transcribed. 2. **Completeness:** Did the comparison report miss any significant items, terms, conditions, or scope details present in the PDFs that would materially affect the comparison? For example, if a PDF mentions a specific material grade and the report doesn't capture this nuance, it's an omission. 3. **Consistency:** Is the interpretation and comparison consistent across all documents and within the report itself? Are there any contradictions or misinterpretations of what's stated in the PDFs? For example, if "preliminaries" are interpreted differently for different quotes without justification. 4. **Objectivity:** Does the comparison report maintain an objective tone, or does it introduce biases not supported by the PDF content? 5. **Clarity and Support of Issues Raised:** Are the issues, risks, and discrepancies highlighted in the comparison report well-supported by direct evidence found in the PDFs? Can you pinpoint the section in the PDF that supports (or refutes) the report's claim? 6. **Coverage of Key Sections:** Confirm if major sections of the PDFs (like Scope of Work, Bill of Quantities, Payment Terms, Warranty, Exclusions, Assumptions) were adequately considered in the comparison. **Input:** 1. The "Quotation Comparison Report" (Markdown). 2. The original PDF quotation documents it was based on (you will receive one or more PDF documents). Each PDF's filename will be provided. **Output Format:** Provide your validation findings in a structured Markdown report: ## Validation Report for Quotation Comparison **Overall Assessment:** * A brief summary of the comparison report's accuracy, completeness, and reliability based on your validation against the source PDFs. State your level of confidence in the report (e.g., High, Medium with caveats, Low - requires significant revision). **Key Validation Findings:** * Use bullet points or a table to list specific findings. For each finding, state: * **Finding ID:** (A unique number for easy reference, e.g., V1, V2) * **Location in Comparison Report:** (e.g., "Executive Summary, Key Finding 2" or "Detailed Scope Table, Item: Concrete Foundation, Column: Contractor A") * **Source PDF(s) Referenced & Location:** (e.g., "Contractor_A.pdf, Page 3, Section 2.1" or "All PDFs, General Terms section") * **Issue Type:** (e.g., Accuracy Error, Omission, Misinterpretation, Inconsistency, Confirmation of Accuracy, Unsupported Claim, Lack of Detail) * **Description of Finding:** (Provide specific details. E.g., "Comparison report states 100m³ for Concrete Foundation from Contractor_A.pdf, but PDF page 3, item 2.1.1 shows 120m³." or "The report correctly identified the warranty difference between Vendor X (PDF: Vendor_X.pdf, pg 5) and Vendor Y (PDF: Vendor_Y.pdf, pg 7).") * **Impact/Recommendation:** (e.g., "This discrepancy understates Contractor A's concrete quantity by 20m³. The comparison report needs correction." or "The omission of payment terms from Vendor Z's PDF (Vendor_Z.pdf, pg 6) in the comparison is critical and should be added.") **Specific Checks Performed (Summary):** * Briefly confirm that you have reviewed aspects like: Document Metadata, Line Item Data (Quantities, Units, Descriptions, Rates, Amounts from BOQs), Scope Statements (Inclusions, Exclusions, Deliverables), Terms and Conditions (Payment, Warranty, Validity, Liabilities), and the support for any risks or issues highlighted in the comparison report. **Conclusion & Recommendations:** * Reiterate overall confidence in the comparison report. * Summarize the most critical validation findings. * State whether the comparison report is fit for use as-is, or if it requires minor/major revisions based on your validation. * If revisions are needed, list the key areas from your findings that must be addressed. Output must be in Markdown format. """ # --- Logging --- logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s') logger = logging.getLogger(__name__) # --- Helper Function for Gemini API Call --- async def generate_content_async( model_name: str, system_instruction: str, parts: list, # List of Part objects or strings max_output_tokens: int, is_pdf_extraction: bool = False ): if not google_libs_imported: return "Error: Google libraries not imported. Cannot call API." try: model = GenerativeModel(model_name) generation_config = GenerationConfig( temperature=TEMPERATURE, max_output_tokens=max_output_tokens, ) safety_settings = { HarmCategory.HARM_CATEGORY_HATE_SPEECH: SafetySetting.HarmBlockThreshold.BLOCK_NONE, HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT: SafetySetting.HarmBlockThreshold.BLOCK_NONE, HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT: SafetySetting.HarmBlockThreshold.BLOCK_NONE, HarmCategory.HARM_CATEGORY_HARASSMENT: SafetySetting.HarmBlockThreshold.BLOCK_NONE, } contents_payload = [system_instruction] + parts logger.info(f"Sending request to {model_name}. Payload items: {len(contents_payload)}") # Using asyncio.to_thread for blocking calls in async Gradio response = await asyncio.to_thread( model.generate_content, contents=contents_payload, generation_config=generation_config, safety_settings=safety_settings, # request_options={"timeout": 600} # Generous timeout for potentially large inputs ) if response.candidates and response.candidates[0].content.parts: raw_output = response.candidates[0].content.parts[0].text if is_pdf_extraction: # Special handling for JSON extraction # Strip markdown fences if present processed_output = raw_output.strip() if processed_output.startswith("```json"): processed_output = processed_output[len("```json"):].strip() elif processed_output.startswith("```"): processed_output = processed_output[len("```"):].strip() if processed_output.endswith("```"): processed_output = processed_output[:-len("```")].strip() # Attempt to ensure it is enclosed in {}. first_brace = processed_output.find('{') last_brace = processed_output.rfind('}') if first_brace != -1 and last_brace != -1 and last_brace > first_brace: processed_output = processed_output[first_brace : last_brace + 1] try: json.loads(processed_output) # Validate JSON logger.info(f"Successfully extracted valid JSON from {model_name}.") return processed_output except json.JSONDecodeError as json_err: logger.error(f"Invalid JSON from {model_name} after stripping. Error: {json_err}") logger.error(f"Raw output snippet: {raw_output[:200]}...") return f"Error: Model returned invalid JSON. {json_err}\nRaw output: {raw_output[:500]}" else: # For Markdown reports logger.info(f"Successfully received report from {model_name}.") return raw_output else: logger.error(f"Unexpected API response structure from {model_name}: {response}") try: finish_reason = response.candidates[0].finish_reason.name if response.candidates else "UNKNOWN" if finish_reason == 'SAFETY': safety_ratings = response.candidates[0].safety_ratings return f"Error: Content generation blocked due to SAFETY. Ratings: {safety_ratings}" return f"Error: No content parts in response from {model_name}. Finish Reason: {finish_reason}" except Exception: return f"Error: Unexpected API response structure from {model_name}: {str(response)[:500]}" except api_exceptions.ResourceExhausted as quota_err: logger.error(f"Quota exceeded for {model_name}: {quota_err}") return f"Error: Quota exceeded. Please try again later. ({quota_err})" except api_exceptions.InvalidArgument as invalid_arg_err: logger.error(f"Invalid argument for {model_name}: {invalid_arg_err}") return f"Error: Invalid argument, possibly request size too large. ({invalid_arg_err})" except Exception as e: logger.error(f"Unexpected error during API call to {model_name}: {e}\n{traceback.format_exc()}") return f"Error: An unexpected error occurred. {e}" # --- Core Logic Functions --- async def process_uploaded_pdfs_to_json(pdf_files, progress=gr.Progress()): """ Processes uploaded PDF files and returns a list of JSON strings. """ if not pdf_files: return [], "No PDF files uploaded." if not google_libs_imported or not pypdf_imported: return [], "Error: Required libraries (Google Cloud or PyPDF) not imported." extracted_jsons = [] status_updates = [] total_files = len(pdf_files) progress(0, desc="Starting PDF processing...") for i, pdf_file_obj in enumerate(pdf_files): filename = pdf_file_obj.name progress((i) / total_files, desc=f"Processing PDF: {os.path.basename(filename)} ({i+1}/{total_files})") logger.info(f"Processing PDF: {filename}") try: # Read PDF bytes directly from the uploaded file object # Gradio's File object has a 'name' attribute which is the temp path with open(pdf_file_obj.name, "rb") as f: pdf_bytes = f.read() if not pdf_bytes: logger.warning(f"PDF {filename} is empty.") extracted_jsons.append({"filename": os.path.basename(filename), "json_data": None, "error": "PDF file is empty."}) status_updates.append(f"Skipped empty PDF: {os.path.basename(filename)}") continue pdf_part = Part.from_data(data=pdf_bytes, mime_type="application/pdf") json_str = await generate_content_async( model_name=MODEL_PDF_EXTRACTION, system_instruction=SI_PDF_EXTRACTION, parts=[pdf_part], max_output_tokens=MAX_OUTPUT_TOKENS_EXTRACTION, is_pdf_extraction=True ) if json_str and not json_str.startswith("Error:"): extracted_jsons.append({"filename": os.path.basename(filename), "json_data": json_str, "error": None}) status_updates.append(f"Successfully extracted JSON from: {os.path.basename(filename)}") else: extracted_jsons.append({"filename": os.path.basename(filename), "json_data": None, "error": json_str}) status_updates.append(f"Failed to extract JSON from: {os.path.basename(filename)}. Reason: {json_str}") except Exception as e: logger.error(f"Error processing PDF {filename}: {e}\n{traceback.format_exc()}") extracted_jsons.append({"filename": os.path.basename(filename), "json_data": None, "error": str(e)}) status_updates.append(f"Error processing PDF {os.path.basename(filename)}: {e}") progress(1, desc="PDF processing complete.") return extracted_jsons, "\n".join(status_updates) async def generate_comparison_from_jsons(json_data_list, progress=gr.Progress()): """ Generates a comparison report from a list of JSON data. """ if not json_data_list: return "No JSON data provided for comparison.", "No JSON data to compare." if not google_libs_imported: return "Error: Google libraries not imported.", "Setup error." progress(0, desc="Starting JSON comparison...") # Filter out items with errors valid_jsons = [item for item in json_data_list if item["json_data"] and not item["error"]] if not valid_jsons: return "No valid JSON data available from PDF extraction for comparison.", "Comparison skipped due to extraction errors." input_text_parts = ["Here are the JSON-formatted quotation documents to compare:\n"] for item in valid_jsons: input_text_parts.append(f"\n--- Quotation from file: {item['filename']} ---\n") input_text_parts.append("```json") input_text_parts.append(item['json_data']) input_text_parts.append("```") user_input_prompt_for_model = "\n".join(input_text_parts) parts_for_api = [Part.from_text(user_input_prompt_for_model)] progress(0.5, desc="Sending JSONs for comparison analysis...") comparison_report_md = await generate_content_async( model_name=MODEL_COMPARISON, system_instruction=SI_JSON_COMPARISON, parts=parts_for_api, max_output_tokens=MAX_OUTPUT_TOKENS_REPORTING ) status = "Comparison report generated." if not comparison_report_md.startswith("Error:") else f"Comparison failed: {comparison_report_md}" progress(1, desc=status) return comparison_report_md, status async def validate_comparison_report(comparison_report_md, pdf_files, progress=gr.Progress()): """ Validates the comparison report against the original PDF files. """ if not comparison_report_md or comparison_report_md.startswith("Error:"): return "Comparison report is missing or invalid. Cannot perform validation.", "Validation skipped." if not pdf_files: return "Original PDF files not available for validation.", "Validation skipped." if not google_libs_imported or not pypdf_imported: return "Error: Required libraries not imported for validation.", "Setup error." progress(0, desc="Starting comparison validation...") parts_for_api = [Part.from_text("## Quotation Comparison Report to Validate:\n\n" + comparison_report_md)] status_updates = ["Preparing PDFs for validation..."] for i, pdf_file_obj in enumerate(pdf_files): filename = os.path.basename(pdf_file_obj.name) progress(i / (len(pdf_files) * 2) , desc=f"Loading PDF for validation: {filename}") # *2 for loading and then API call try: with open(pdf_file_obj.name, "rb") as f: pdf_bytes = f.read() if pdf_bytes: parts_for_api.append(Part.from_text(f"\n\n--- Original PDF Document: {filename} ---\n")) parts_for_api.append(Part.from_data(data=pdf_bytes, mime_type="application/pdf")) status_updates.append(f"Added PDF {filename} to validation context.") else: status_updates.append(f"Skipped empty PDF {filename} for validation.") except Exception as e: logger.error(f"Error reading PDF {filename} for validation: {e}") status_updates.append(f"Error reading PDF {filename} for validation: {e}") # Optionally, decide if validation can proceed without this PDF if len(parts_for_api) <= 1: # Only the comparison report part return "No valid PDFs could be loaded for validation.", "\n".join(status_updates) progress(0.5, desc="Sending report and PDFs for validation analysis...") validation_report_md = await generate_content_async( model_name=MODEL_VALIDATION, system_instruction=SI_COMPARISON_VALIDATION, parts=parts_for_api, max_output_tokens=MAX_OUTPUT_TOKENS_REPORTING ) status = "Validation report generated." if not validation_report_md.startswith("Error:") else f"Validation failed: {validation_report_md}" status_updates.append(status) progress(1, desc=status) return validation_report_md, "\n".join(status_updates) # --- Gradio Interface --- async def run_full_process(pdf_files_list, progress=gr.Progress(track_tqdm=True)): if not pdf_files_list: return "Please upload PDF files.", "", "", "No files uploaded." # Step 1: Process PDFs to JSON progress(0, desc="Step 1: Extracting JSON from PDFs...") extracted_jsons_data, s1_status = await process_uploaded_pdfs_to_json(pdf_files_list, progress) # Prepare JSON output for display json_display_list = [] has_successful_extraction = False for item in extracted_jsons_data: if item["json_data"]: json_display_list.append({item["filename"]: json.loads(item["json_data"])}) # Display parsed JSON has_successful_extraction = True else: json_display_list.append({item["filename"]: {"error": item["error"]}}) json_output_str = json.dumps(json_display_list, indent=2) if json_display_list else "No JSONs extracted or all failed." if not has_successful_extraction: final_status = f"Step 1 Status:\n{s1_status}\n\nNo valid JSONs extracted. Cannot proceed to comparison or validation." return json_output_str, "No valid JSONs for comparison.", "No comparison to validate.", final_status # Step 2: Generate Comparison Report from JSONs progress(0, desc="Step 2: Generating Comparison Report...") # Reset progress for new step comparison_report, s2_status = await generate_comparison_from_jsons(extracted_jsons_data, progress) if comparison_report.startswith("Error:"): final_status = f"Step 1 Status:\n{s1_status}\n\nStep 2 Status (Comparison):\n{s2_status}\n\nComparison failed. Cannot proceed to validation." return json_output_str, comparison_report, "Comparison failed.", final_status # Step 3: Validate Comparison Report progress(0, desc="Step 3: Validating Comparison Report...") # Reset progress for new step validation_report, s3_status = await validate_comparison_report(comparison_report, pdf_files_list, progress) final_status = f"Step 1 Status (PDF to JSON):\n{s1_status}\n\nStep 2 Status (JSON Comparison):\n{s2_status}\n\nStep 3 Status (Validation):\n{s3_status}" return json_output_str, comparison_report, validation_report, final_status # Check if essential libraries were imported correctly initialization_error_message = "" if not google_libs_imported: initialization_error_message += "FATAL ERROR: Google Cloud/Vertex AI libraries failed to initialize. Check logs.\n" if not pypdf_imported: initialization_error_message += "FATAL ERROR: PyPDF library failed to import. Check installation.\n" if GCP_PROJECT_ID == "YOUR_GCP_PROJECT_ID" or GCP_LOCATION == "YOUR_GCP_LOCATION": initialization_error_message += "WARNING: GCP_PROJECT_ID and/or GCP_LOCATION are not configured. The application will likely fail.\n" with gr.Blocks(theme=gr.themes.Soft()) as demo: gr.Markdown(f""" # PDF Quotation Comparison and Validation App Upload multiple PDF quotation files to extract data, compare them, and validate the comparison. **GCP Project:** `{GCP_PROJECT_ID}` | **Location:** `{GCP_LOCATION}` **PDF Extraction Model:** `{MODEL_PDF_EXTRACTION}` **Comparison & Validation Model:** `{MODEL_COMPARISON}` """) if initialization_error_message: gr.Markdown(f"<h3 style='color:red;'>Initialization Issues:</h3><pre>{initialization_error_message}</pre>") with gr.Row(): pdf_upload = gr.File( label="Upload PDF Quotation Files", file_count="multiple", file_types=[".pdf"] ) process_button = gr.Button("Process and Compare Quotations", variant="primary") with gr.Accordion("Processing Status & Logs", open=False): status_output = gr.Textbox(label="Process Log / Status", lines=10, interactive=False) gr.Markdown("## Results") with gr.Tabs(): with gr.TabItem("Extracted JSON Data"): json_results_output = gr.Code(label="JSON Data from PDFs", language="json", interactive=False) with gr.TabItem("Comparison Report"): comparison_report_output = gr.Markdown(label="Quotation Comparison Report") with gr.TabItem("Validation Report"): validation_report_output = gr.Markdown(label="Comparison Validation Report") process_button.click( fn=run_full_process, inputs=[pdf_upload], outputs=[json_results_output, comparison_report_output, validation_report_output, status_output] ) gr.Examples( examples=[], # You can add paths to example PDFs here if running locally with files inputs=[pdf_upload], label="Example PDF Sets (requires local files - not directly usable on shared Spaces)" ) gr.Markdown("Note: Processing can take some time depending on the number and size of PDFs and model response times.") if __name__ == "__main__": if initialization_error_message and ("FATAL ERROR" in initialization_error_message): print("Application will not launch due to fatal initialization errors.") print(initialization_error_message) else: print("Launching Gradio app...") # For Cloud Run, server_name="0.0.0.0" is crucial. # server_port is typically 8080 for Cloud Run default. demo.launch(server_name="0.0.0.0", server_port=int(os.environ.get("PORT", 8080))) requirements.txt: google-cloud-aiplatform>=1.40.0 pypdf>=3.0.0 gradio>=4.0.0 # google-generativeai might be pulled in by aiplatform, but explicitly: # google-generativeai IGNORE_WHEN_COPYING_START content_copy download Use code with caution. IGNORE_WHEN_COPYING_END Dockerfile: # Use an official Python runtime as a parent image FROM python:3.10-slim # Set the working directory in the container WORKDIR /app # Copy the requirements file into the container at /app COPY requirements.txt . # Install any needed packages specified in requirements.txt RUN pip install --no-cache-dir -r requirements.txt # Copy the current directory contents into the container at /app COPY app.py . # If you have other assets like system_instructions.py, copy them too # COPY system_instructions.py . # Make port 8080 available to the world outside this container # Cloud Run expects the app to listen on the port specified by the PORT env var (default 8080) EXPOSE 8080 # Define environment variable for the port (Cloud Run will set this) ENV PORT=8080 ENV GCP_PROJECT_ID="YOUR_GCP_PROJECT_ID" ENV GCP_LOCATION="YOUR_GCP_LOCATION" # It's better to set GCP_PROJECT_ID and GCP_LOCATION during Cloud Run deployment # or ensure the service account has access and the app can infer them. # For local testing, you can set them here or in your shell. # Run app.py when the container launches CMD ["python", "app.py"] IGNORE_WHEN_COPYING_START content_copy download Use code with caution. Dockerfile IGNORE_WHEN_COPYING_END To Deploy to Cloud Run: Replace Placeholders: In app.py, ensure GCP_PROJECT_ID and GCP_LOCATION are correctly set (preferably via environment variables during deployment rather than hardcoding if you share the code). In Dockerfile, you can update the ENV for GCP_PROJECT_ID and GCP_LOCATION if you want to build them in, but it's more flexible to set these when deploying the Cloud Run service. Enable APIs: Ensure "Cloud Run API", "Vertex AI API", and "Artifact Registry API" (if using it to store your container image) are enabled in your GCP project. Authenticate Docker: gcloud auth configure-docker (e.g., gcloud auth configure-docker YOUR_REGION-docker.pkg.dev) Build the Docker Image: export GCP_PROJECT="YOUR_GCP_PROJECT_ID" # Your actual project ID export REGION="YOUR_REGION" # e.g., us-central1 export IMAGE_NAME="quotation-processor-gradio" export IMAGE_TAG="latest" export IMAGE_URI="${REGION}-docker.pkg.dev/${GCP_PROJECT}/cloud-run-source-deploy/${IMAGE_NAME}:${IMAGE_TAG}" # Or your preferred Artifact Registry path docker build -t ${IMAGE_URI} . IGNORE_WHEN_COPYING_START content_copy download Use code with caution. Bash IGNORE_WHEN_COPYING_END Push the Docker Image: docker push ${IMAGE_URI} IGNORE_WHEN_COPYING_START content_copy download Use code with caution. Bash IGNORE_WHEN_COPYING_END Deploy to Cloud Run: gcloud run deploy quotation-processor-gradio-app \ --image=${IMAGE_URI} \ --platform=managed \ --region=${REGION} \ --allow-unauthenticated \ --set-env-vars="GCP_PROJECT_ID=${GCP_PROJECT}" \ --set-env-vars="GCP_LOCATION=${REGION}" \ --memory=2Gi \ --cpu=1 \ --timeout=900 \ --concurrency=10 # Adjust based on expected load and model processing time # Ensure the service account used by Cloud Run has Vertex AI User permissions IGNORE_WHEN_COPYING_START content_copy download Use code with caution. Bash IGNORE_WHEN_COPYING_END --allow-unauthenticated makes it public. Adjust as needed. --timeout=900 (15 minutes) is a generous timeout for the AI processing. Adjust as necessary. --memory and --cpu might need adjustment. PDF processing can be memory-intensive. --concurrency: Since each request involves multiple LLM calls, keep concurrency relatively low to avoid overwhelming the backend or hitting API rate limits quickly, unless you have high quotas. This setup provides a comprehensive solution. Remember that processing multiple PDFs and making several LLM calls can take time, so the Gradio interface might appear busy. The progress bars and status updates aim to provide feedback.
|
| 2 |
+
make it use qwen 3 to do all the things without backend api "curl https://router.huggingface.co/fireworks-ai/inference/v1/chat/completions \ -H 'Authorization: Bearer hf_xxxxxxxxxxxxxxxxxxxxxxxx' \ -H 'Content-Type: application/json' \ -d '{ "messages": [ { "role": "user", "content": "What is the capital of France?" } ], "model": "accounts/fireworks/models/qwen3-235b-a22b", "stream": false }'"
|
| 3 |
+
not fireworks api instead let it use hf token
|