github-actions[bot] commited on
Commit
a571c24
Β·
1 Parent(s): 6d9fd48

Sync from GitHub: 81ad14fcc4be611ad6ac0e65b151cdf9225c7ee9

Browse files
Files changed (2) hide show
  1. .gitignore +2 -1
  2. README_git.md +280 -0
.gitignore CHANGED
@@ -28,8 +28,9 @@ htmlcov/
28
  .ipynb_checkpoints/
29
 
30
  *.md
31
- !README_HF.md
32
  !README.md
33
  test*
34
  executable.py
35
  client_example.py
 
 
28
  .ipynb_checkpoints/
29
 
30
  *.md
31
+ !README_git.md
32
  !README.md
33
  test*
34
  executable.py
35
  client_example.py
36
+ Docs
README_git.md ADDED
@@ -0,0 +1,280 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Invoice Information Extractor API
2
+
3
+ Extract structured information from Indian tractor invoices using AI-powered REST API.
4
+
5
+ ## What It Does
6
+
7
+ Combines **YOLO** (signature/stamp detection) + **Qwen2.5-VL** (text extraction) to extract:
8
+ - Dealer name
9
+ - Model name
10
+ - Horse power
11
+ - Asset cost
12
+ - Signature presence & location
13
+ - Stamp presence & location
14
+
15
+ ## Architecture
16
+
17
+ ### Production (Hugging Face Deployment)
18
+ - **FastAPI server** with REST endpoints
19
+ - **Models loaded on startup** and cached in memory
20
+ - **YOLO model** stored locally in `utils/models/best.pt`
21
+ - **Qwen2.5-VL** downloaded from Hugging Face on first run (not stored locally)
22
+
23
+ ### Key Components
24
+ - `app.py` - FastAPI server with endpoints
25
+ - `model_manager.py` - Handles model loading and caching
26
+ - `inference.py` - Processing pipeline and validation
27
+ - `config.py` - Configuration settings
28
+ - `executable.py` - Legacy CLI interface (deprecated)
29
+
30
+ ## Installation
31
+
32
+ ```bash
33
+ pip install -r requirements.txt
34
+ ```
35
+
36
+ **Requirements:** Python 3.10+, CUDA GPU (8GB+ VRAM)
37
+
38
+ ## Running the Server
39
+
40
+ ### Local Development
41
+ ```bash
42
+ python app.py
43
+ ```
44
+
45
+ Server runs on `http://localhost:7860`
46
+
47
+ ### Production (Hugging Face Spaces)
48
+ ```bash
49
+ uvicorn app:app --host 0.0.0.0 --port 7860
50
+ ```
51
+
52
+ ## API Endpoints
53
+
54
+ ### 1. Health Check
55
+ ```bash
56
+ GET /health
57
+ ```
58
+
59
+ **Response:**
60
+ ```json
61
+ {
62
+ "status": "healthy",
63
+ "models_loaded": true
64
+ }
65
+ ```
66
+
67
+ ### 2. Extract Single Invoice
68
+ ```bash
69
+ POST /extract
70
+ ```
71
+
72
+ **Parameters:**
73
+ - `file` (required): Image file (JPG, PNG, JPEG)
74
+ - `doc_id` (optional): Document identifier
75
+
76
+ **Example (cURL):**
77
+ ```bash
78
+ curl -X POST "http://localhost:7860/extract" \
79
+ -F "file=@invoice_001.png" \
80
+ -F "doc_id=invoice_001"
81
+ ```
82
+
83
+ **Example (Python):**
84
+ ```python
85
+ import requests
86
+
87
+ url = "http://localhost:7860/extract"
88
+ files = {"file": open("invoice_001.png", "rb")}
89
+ data = {"doc_id": "invoice_001"}
90
+
91
+ response = requests.post(url, files=files, data=data)
92
+ print(response.json())
93
+ ```
94
+
95
+ **Response:**
96
+ ```json
97
+ {
98
+ "doc_id": "invoice_001",
99
+ "fields": {
100
+ "dealer_name": "ABC Tractors Pvt Ltd",
101
+ "model_name": "Mahindra 575 DI",
102
+ "horse_power": 50,
103
+ "asset_cost": 525000,
104
+ "signature": {"present": true, "bbox": [100, 200, 300, 250]},
105
+ "stamp": {"present": true, "bbox": [400, 500, 500, 550]}
106
+ },
107
+ "confidence": 0.89,
108
+ "processing_time_sec": 3.8,
109
+ "cost_estimate_usd": 0.000528,
110
+ "warnings": null
111
+ }
112
+ ```
113
+
114
+ ### 3. Extract Multiple Invoices (Batch)
115
+ ```bash
116
+ POST /extract_batch
117
+ ```
118
+
119
+ **Parameters:**
120
+ - `files` (required): Array of image files
121
+
122
+ **Example (Python):**
123
+ ```python
124
+ import requests
125
+
126
+ url = "http://localhost:7860/extract_batch"
127
+ files = [
128
+ ("files", open("invoice_001.png", "rb")),
129
+ ("files", open("invoice_002.png", "rb"))
130
+ ]
131
+
132
+ response = requests.post(url, files=files)
133
+ print(response.json())
134
+ ```
135
+
136
+ ### 4. Interactive Documentation
137
+ ```bash
138
+ GET /docs
139
+ ```
140
+
141
+ Visit `http://localhost:7860/docs` for interactive API documentation (Swagger UI).
142
+
143
+ ## Output Format
144
+
145
+ Results saved to `sample_output/result.json`:
146
+
147
+ ```json
148
+ {
149
+ "doc_id": "invoice_001",
150
+ "fields": {
151
+ "dealer_name": "ABC Tractors Pvt Ltd",
152
+ "model_name": "Mahindra 575 DI",
153
+ "horse_power": 50,
154
+ "asset_cost": 525000,
155
+ "signature": {"present": true, "bbox": [100, 200, 300, 250]},
156
+ "stamp": {"present": true, "bbox": [400, 500, 500, 550]}
157
+ },
158
+ "confidence": 0.89,
159
+ "processing_time_sec": 3.8,
160
+ "cost_estimate_usd": 0.000528
161
+ }
162
+ ```
163
+
164
+ ## Confidence Calculation
165
+
166
+ Overall confidence is the **average** of:
167
+ 1. **Field validation confidence** - From dealer_name, model_name, horse_power, asset_cost validation
168
+ 2. **Signature detection confidence** - YOLO confidence score (if signature present)
169
+ 3. **Stamp detection confidence** - YOLO confidence score (if stamp present)
170
+
171
+ **Formula:**
172
+ ```
173
+ confidence = (field_conf + signature_conf + stamp_conf) / 3
174
+ ```
175
+
176
+ Range: 0.0 to 1.0 (higher is better)
177
+
178
+ ## Cost Calculation
179
+
180
+ **Formula:**
181
+ ```
182
+ cost_usd = (0.5 * processing_time_sec) / 3600
183
+ ```
184
+
185
+ Assumes **$0.50 per GPU hour**
186
+
187
+ **Typical costs:**
188
+ - Per invoice: ~$0.002
189
+ - 100 invoices: ~$0.2
190
+ - Processing time: ~15 seconds
191
+
192
+ ## Models
193
+
194
+ - **YOLO:** Signature/stamp detection (`best.pt`)
195
+ - **Qwen2.5-VL-7B:** Text extraction (4-bit quantized)
196
+
197
+ ## GPU Requirements
198
+
199
+ - **Minimum:** 8GB VRAM
200
+
201
+ ## Troubleshooting
202
+
203
+ **Debug mode:** Use `--debug` flag to see raw VLM output and parsed JSON
204
+
205
+ ## Project Structure
206
+
207
+ ```
208
+ INVOICE_INFO_EXTRACTOR/
209
+ β”œβ”€β”€ app.py # FastAPI server (main entry point)
210
+ β”œβ”€β”€ model_manager.py # Model loading and caching
211
+ β”œβ”€β”€ inference.py # Processing pipeline and validation
212
+ β”œβ”€β”€ config.py # Configuration settings
213
+ β”œβ”€β”€ requirements.txt
214
+ β”œβ”€β”€ README.md
215
+ β”œβ”€β”€ executable.py # Legacy CLI (deprecated)
216
+ β”œβ”€β”€ utils/
217
+ β”‚ └── models/
218
+ β”‚ └── best.pt # YOLO model (stored locally)
219
+ └── sample_output/
220
+ └── result.json # Sample output
221
+ ```
222
+
223
+ ## Deployment on Hugging Face Spaces
224
+
225
+ ### 1. Create `Dockerfile` (optional)
226
+ ```dockerfile
227
+ FROM python:3.10-slim
228
+
229
+ WORKDIR /app
230
+
231
+ # Install system dependencies
232
+ RUN apt-get update && apt-get install -y \
233
+ git \
234
+ libgl1-mesa-glx \
235
+ libglib2.0-0 \
236
+ && rm -rf /var/lib/apt/lists/*
237
+
238
+ # Copy requirements and install
239
+ COPY requirements.txt .
240
+ RUN pip install --no-cache-dir -r requirements.txt
241
+
242
+ # Copy application files
243
+ COPY . .
244
+
245
+ # Expose port
246
+ EXPOSE 7860
247
+
248
+ # Run the application
249
+ CMD ["python", "app.py"]
250
+ ```
251
+
252
+ ### 2. Create `.gitignore`
253
+ ```
254
+ __pycache__/
255
+ *.pyc
256
+ .env
257
+ sample_output/
258
+ *.pt.backup
259
+ venv/
260
+ .vscode/
261
+ ```
262
+
263
+ ### 3. Upload to Hugging Face
264
+ 1. Create new Space on Hugging Face
265
+ 2. Select "Docker" or "Gradio" SDK
266
+ 3. Upload files: `app.py`, `model_manager.py`, `inference.py`, `config.py`, `requirements.txt`
267
+ 4. Upload YOLO model: `utils/models/best.pt`
268
+ 5. Set hardware: GPU (T4 or better)
269
+
270
+ ### 4. Environment Variables (if needed)
271
+ ```
272
+ HF_TOKEN=your_token_here
273
+ ```
274
+
275
+ ## Performance
276
+
277
+ - **Processing time:** ~3-5 seconds per invoice
278
+ - **Cost per invoice:** ~$0.0005 (GPU time)
279
+ - **Batch processing:** Supported via `/extract_batch`
280
+ - **GPU Memory:** 8GB minimum