github-actions[bot] commited on
Commit
aeb681f
Β·
1 Parent(s): 4dbceee

Sync from GitHub: 67c5ee67bf66d7b77be5e2ffbfaa22681c2e0ebf

Browse files
Files changed (2) hide show
  1. README.md +129 -111
  2. README_git.md +4 -105
README.md CHANGED
@@ -1,5 +1,5 @@
1
  ---
2
- title: Invoice Information Extractor
3
  emoji: πŸ“„
4
  colorFrom: blue
5
  colorTo: green
@@ -9,164 +9,182 @@ license: mit
9
  app_port: 7860
10
  ---
11
 
12
- # Invoice Information Extractor 🧾
13
 
14
- A complete full-stack application for extracting information from invoices using AI. Features a modern React frontend and powerful FastAPI backend with YOLOv8 and PaddleOCR.
15
 
16
- ## ✨ Features
17
 
18
- ### Frontend (React + Vite)
19
- - πŸ“ **Drag-and-drop file upload** with multi-file support
20
- - πŸ“„ **PDF to Image conversion** (automatically converts multi-page PDFs)
21
- - πŸ”„ **Batch processing** with real-time progress tracking
22
- - 🎨 **Visual detection** of signatures and stamps with bounding boxes
23
- - πŸ“± **Responsive design** that works on all devices
24
- - ⚑ **Fast and modern** UI with Tailwind CSS
25
 
26
- ### Backend (FastAPI + AI Models)
27
- - πŸ€– **YOLOv8 object detection** for signatures and stamps
28
- - πŸ“ **PaddleOCR** for text extraction
29
- - πŸš€ **High-performance API** with async support
30
- - πŸ“Š **Batch processing** capabilities
31
- - πŸ”’ **CORS enabled** for frontend integration
32
- - πŸ“š **Interactive API docs** at /docs
33
 
34
- ## 🎯 Quick Start
 
 
 
 
35
 
36
- ### Option 1: Automated Setup (Windows)
 
 
 
 
 
37
 
38
- Run the automated setup script:
39
- ```bash
40
- setup.bat
41
- ```
42
 
43
- Then start both servers:
44
  ```bash
45
- start.bat
46
  ```
47
 
48
- ### Option 2: Manual Setup
49
 
50
- #### Backend Setup
51
- ```bash
52
- # Install dependencies
53
- pip install -r requirements.txt
54
 
55
- # Start the server
 
56
  python app.py
57
  ```
58
- Backend runs on: http://localhost:7860
59
 
60
- #### Frontend Setup
61
- ```bash
62
- # Navigate to frontend
63
- cd frontend
64
 
65
- # Install dependencies
66
- npm install
 
 
67
 
68
- # Create environment file
69
- cp .env.example .env
70
 
71
- # Start development server
72
- npm run dev
 
73
  ```
74
- Frontend runs on: http://localhost:3000
75
-
76
- ## πŸ–₯️ Usage
77
 
78
- 1. **Open the application** at http://localhost:3000
79
- 2. **Upload files** by dragging and dropping or clicking to browse
80
- - Supports: JPEG, PNG, GIF, WEBP images and PDF files
81
- 3. **Click "Process"** to start extraction
82
- 4. **View results** with:
83
- - Extracted text from the invoice
84
- - Visual detection of signatures (red boxes)
85
- - Visual detection of stamps (blue boxes)
86
- - Coordinates and metadata
87
 
88
- ## πŸ”Œ API Endpoints
 
 
 
89
 
90
- ### POST /process-invoice
91
- Process a single invoice image
 
92
 
93
- **Request:**
94
  ```bash
95
- curl -X POST "http://localhost:7860/process-invoice" \
96
- -F "file=@invoice.jpg"
 
97
  ```
98
 
 
 
99
  **Response:**
100
  ```json
101
  {
102
- "extracted_text": "Invoice details...",
103
- "signature_coords": [[x1, y1, x2, y2]],
104
- "stamp_coords": [[x1, y1, x2, y2]],
105
- "doc_id": "invoice",
106
- "processing_time": 2.5
 
 
 
 
 
 
 
 
107
  }
108
  ```
109
 
110
- ### GET /docs
111
- Interactive API documentation (Swagger UI)
112
-
113
- ### GET /health
114
- Health check endpoint
115
 
116
- ## πŸ› οΈ Technology Stack
 
117
 
118
- ### Frontend
119
- - **React 18** - Modern UI library
120
- - **Vite** - Lightning-fast build tool
121
- - **Tailwind CSS** - Utility-first CSS
122
- - **PDF.js** - PDF rendering
123
- - **Axios** - HTTP client
124
 
125
- ### Backend
126
- - **FastAPI** - Modern Python web framework
127
- - **YOLOv8** - State-of-the-art object detection
128
- - **PaddleOCR** - Multilingual OCR
129
- - **Uvicorn** - ASGI server
130
 
131
- ## πŸ“– Documentation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
132
 
133
- - [Setup Guide](SETUP_GUIDE.md) - Detailed setup instructions
134
- - [Frontend Architecture](frontend/ARCHITECTURE.md) - Frontend technical details
135
- - [API Documentation](http://localhost:7860/docs) - Interactive API docs (when running)
136
 
137
- ## πŸš€ Deployment
138
 
139
- ### Hugging Face Spaces
140
 
141
- The application is fully configured for deployment to Hugging Face Spaces:
 
 
 
142
 
143
- 1. **Build frontend**:
144
- ```bash
145
- build-frontend.bat
146
- ```
147
 
148
- 2. **Push to GitHub** (auto-deploys via GitHub Actions):
149
- ```bash
150
- git add .
151
- git commit -m "Deploy to HF"
152
- git push origin main
153
- ```
154
 
155
- See [HF_DEPLOYMENT_READY.md](HF_DEPLOYMENT_READY.md) and [DEPLOYMENT.md](DEPLOYMENT.md) for detailed instructions.
156
 
157
- ### Docker
 
158
 
159
- ```bash
160
- docker build -t invoice-extractor .
161
- docker run -p 7860:7860 invoice-extractor
162
- ```
163
 
164
- The Dockerfile automatically builds the frontend during the build process.
165
 
166
- ## πŸ“„ License
167
 
168
- This project is licensed under the MIT License.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
169
 
170
- ---
171
 
172
- **Made with ❀️ using AI and modern web technologies**
 
 
 
1
  ---
2
+ title: Tractor Invoice Information Extractor
3
  emoji: πŸ“„
4
  colorFrom: blue
5
  colorTo: green
 
9
  app_port: 7860
10
  ---
11
 
12
+ # Invoice Information Extractor API
13
 
14
+ Extract structured information from Indian tractor invoices using AI-powered REST API.
15
 
16
+ ## What It Does
17
 
18
+ Combines **YOLO** (signature/stamp detection) + **Qwen2.5-VL** (text extraction) to extract:
19
+ - Dealer name
20
+ - Model name
21
+ - Horse power
22
+ - Asset cost
23
+ - Signature presence & location
24
+ - Stamp presence & location
25
 
26
+ ## Architecture
 
 
 
 
 
 
27
 
28
+ ### Production (Hugging Face Deployment)
29
+ - **FastAPI server** with REST endpoints
30
+ - **Models loaded on startup** and cached in memory
31
+ - **YOLO model** stored locally in `utils/models/best.pt`
32
+ - **Qwen2.5-VL** downloaded from Hugging Face on first run (not stored locally)
33
 
34
+ ### Key Components
35
+ - `app.py` - FastAPI server with endpoints
36
+ - `model_manager.py` - Handles model loading and caching
37
+ - `inference.py` - Processing pipeline and validation
38
+ - `config.py` - Configuration settings
39
+ - `executable.py` - Legacy CLI interface (deprecated)
40
 
41
+ ## Installation
 
 
 
42
 
 
43
  ```bash
44
+ pip install -r requirements.txt
45
  ```
46
 
47
+ **Requirements:** Python 3.10+, CUDA GPU (8GB+ VRAM)
48
 
49
+ ## Running the Server
 
 
 
50
 
51
+ ### Local Development
52
+ ```bash
53
  python app.py
54
  ```
 
55
 
56
+ Server runs on `http://localhost:7860`
 
 
 
57
 
58
+ ### Production (Hugging Face Spaces)
59
+ ```bash
60
+ uvicorn app:app --host 0.0.0.0 --port 7860
61
+ ```
62
 
63
+ ## API Endpoints
 
64
 
65
+ ### 1. Health Check
66
+ ```bash
67
+ GET /health
68
  ```
 
 
 
69
 
70
+ **Response:**
71
+ ```json
72
+ {
73
+ "status": "healthy",
74
+ "models_loaded": true
75
+ }
76
+ ```
 
 
77
 
78
+ ### 2. Extract Single Invoice
79
+ ```bash
80
+ POST /extract
81
+ ```
82
 
83
+ **Parameters:**
84
+ - `file` (required): Image file (JPG, PNG, JPEG)
85
+ - `doc_id` (optional): Document identifier
86
 
87
+ **Example (cURL):**
88
  ```bash
89
+ curl -X POST "http://localhost:7860/extract" \
90
+ -F "file=@invoice_001.png" \
91
+ -F "doc_id=invoice_001"
92
  ```
93
 
94
+
95
+
96
  **Response:**
97
  ```json
98
  {
99
+ "doc_id": "invoice_001",
100
+ "fields": {
101
+ "dealer_name": "ABC Tractors Pvt Ltd",
102
+ "model_name": "Mahindra 575 DI",
103
+ "horse_power": 50,
104
+ "asset_cost": 525000,
105
+ "signature": {"present": true, "bbox": [100, 200, 300, 250]},
106
+ "stamp": {"present": true, "bbox": [400, 500, 500, 550]}
107
+ },
108
+ "confidence": 0.89,
109
+ "processing_time_sec": 3.8,
110
+ "cost_estimate_usd": 0.000528,
111
+ "warnings": null
112
  }
113
  ```
114
 
115
+ ### 3. Extract Multiple Invoices (Batch)
116
+ ```bash
117
+ POST /extract_batch
118
+ ```
 
119
 
120
+ **Parameters:**
121
+ - `files` (required): Array of image files
122
 
123
+ ## Output Format
 
 
 
 
 
124
 
125
+ Results saved to `sample_output/result.json`:
 
 
 
 
126
 
127
+ ```json
128
+ {
129
+ "doc_id": "invoice_001",
130
+ "fields": {
131
+ "dealer_name": "ABC Tractors Pvt Ltd",
132
+ "model_name": "Mahindra 575 DI",
133
+ "horse_power": 50,
134
+ "asset_cost": 525000,
135
+ "signature": {"present": true, "bbox": [100, 200, 300, 250]},
136
+ "stamp": {"present": true, "bbox": [400, 500, 500, 550]}
137
+ },
138
+ "confidence": 0.89,
139
+ "processing_time_sec": 3.8,
140
+ "cost_estimate_usd": 0.000528
141
+ }
142
+ ```
143
 
 
 
 
144
 
145
+ Range: 0.0 to 1.0 (higher is better)
146
 
147
+ ## Cost Calculation
148
 
149
+ **Formula:**
150
+ ```
151
+ cost_usd = (0.5 * processing_time_sec) / 3600
152
+ ```
153
 
154
+ Assumes **$0.60 per GPU hour**
 
 
 
155
 
156
+ **Typical costs:**
157
+ - Per invoice: ~$0.002
 
 
 
 
158
 
159
+ ## Models
160
 
161
+ - **YOLO:** Signature/stamp detection (`best.pt`)
162
+ - **Qwen2.5-VL-7B:** Text extraction (4-bit quantized)
163
 
164
+ ## GPU Requirements
 
 
 
165
 
166
+ - **Minimum:** 10 GB VRAM
167
 
168
+ ## Project Structure
169
 
170
+ ```
171
+ INVOICE_INFO_EXTRACTOR/
172
+ β”œβ”€β”€ app.py # FastAPI server (main entry point)
173
+ β”œβ”€β”€ model_manager.py # Model loading and caching
174
+ β”œβ”€β”€ inference.py # Processing pipeline and validation
175
+ β”œβ”€β”€ config.py # Configuration settings
176
+ β”œβ”€β”€ requirements.txt
177
+ β”œβ”€β”€ README.md
178
+ β”œβ”€β”€ executable.py # Legacy CLI (deprecated)
179
+ β”œβ”€β”€ utils/
180
+ β”‚ └── models/
181
+ β”‚ └── best.pt # YOLO model (stored locally)
182
+ └── sample_output/
183
+ └── result.json # Sample output
184
+ ```
185
 
186
+ ## Performance
187
 
188
+ - **Processing time:** ~8 seconds per invoice
189
+ - **Cost per invoice:** ~$0.002 (GPU time)
190
+ - **GPU Memory:** 8GB minimum
README_git.md CHANGED
@@ -80,17 +80,7 @@ curl -X POST "http://localhost:7860/extract" \
80
  -F "doc_id=invoice_001"
81
  ```
82
 
83
- **Example (Python):**
84
- ```python
85
- import requests
86
 
87
- url = "http://localhost:7860/extract"
88
- files = {"file": open("invoice_001.png", "rb")}
89
- data = {"doc_id": "invoice_001"}
90
-
91
- response = requests.post(url, files=files, data=data)
92
- print(response.json())
93
- ```
94
 
95
  **Response:**
96
  ```json
@@ -119,27 +109,6 @@ POST /extract_batch
119
  **Parameters:**
120
  - `files` (required): Array of image files
121
 
122
- **Example (Python):**
123
- ```python
124
- import requests
125
-
126
- url = "http://localhost:7860/extract_batch"
127
- files = [
128
- ("files", open("invoice_001.png", "rb")),
129
- ("files", open("invoice_002.png", "rb"))
130
- ]
131
-
132
- response = requests.post(url, files=files)
133
- print(response.json())
134
- ```
135
-
136
- ### 4. Interactive Documentation
137
- ```bash
138
- GET /docs
139
- ```
140
-
141
- Visit `http://localhost:7860/docs` for interactive API documentation (Swagger UI).
142
-
143
  ## Output Format
144
 
145
  Results saved to `sample_output/result.json`:
@@ -161,17 +130,6 @@ Results saved to `sample_output/result.json`:
161
  }
162
  ```
163
 
164
- ## Confidence Calculation
165
-
166
- Overall confidence is the **average** of:
167
- 1. **Field validation confidence** - From dealer_name, model_name, horse_power, asset_cost validation
168
- 2. **Signature detection confidence** - YOLO confidence score (if signature present)
169
- 3. **Stamp detection confidence** - YOLO confidence score (if stamp present)
170
-
171
- **Formula:**
172
- ```
173
- confidence = (field_conf + signature_conf + stamp_conf) / 3
174
- ```
175
 
176
  Range: 0.0 to 1.0 (higher is better)
177
 
@@ -182,12 +140,10 @@ Range: 0.0 to 1.0 (higher is better)
182
  cost_usd = (0.5 * processing_time_sec) / 3600
183
  ```
184
 
185
- Assumes **$0.50 per GPU hour**
186
 
187
  **Typical costs:**
188
  - Per invoice: ~$0.002
189
- - 100 invoices: ~$0.2
190
- - Processing time: ~15 seconds
191
 
192
  ## Models
193
 
@@ -196,11 +152,7 @@ Assumes **$0.50 per GPU hour**
196
 
197
  ## GPU Requirements
198
 
199
- - **Minimum:** 8GB VRAM
200
-
201
- ## Troubleshooting
202
-
203
- **Debug mode:** Use `--debug` flag to see raw VLM output and parsed JSON
204
 
205
  ## Project Structure
206
 
@@ -220,61 +172,8 @@ INVOICE_INFO_EXTRACTOR/
220
  └── result.json # Sample output
221
  ```
222
 
223
- ## Deployment on Hugging Face Spaces
224
-
225
- ### 1. Create `Dockerfile` (optional)
226
- ```dockerfile
227
- FROM python:3.10-slim
228
-
229
- WORKDIR /app
230
-
231
- # Install system dependencies
232
- RUN apt-get update && apt-get install -y \
233
- git \
234
- libgl1-mesa-glx \
235
- libglib2.0-0 \
236
- && rm -rf /var/lib/apt/lists/*
237
-
238
- # Copy requirements and install
239
- COPY requirements.txt .
240
- RUN pip install --no-cache-dir -r requirements.txt
241
-
242
- # Copy application files
243
- COPY . .
244
-
245
- # Expose port
246
- EXPOSE 7860
247
-
248
- # Run the application
249
- CMD ["python", "app.py"]
250
- ```
251
-
252
- ### 2. Create `.gitignore`
253
- ```
254
- __pycache__/
255
- *.pyc
256
- .env
257
- sample_output/
258
- *.pt.backup
259
- venv/
260
- .vscode/
261
- ```
262
-
263
- ### 3. Upload to Hugging Face
264
- 1. Create new Space on Hugging Face
265
- 2. Select "Docker" or "Gradio" SDK
266
- 3. Upload files: `app.py`, `model_manager.py`, `inference.py`, `config.py`, `requirements.txt`
267
- 4. Upload YOLO model: `utils/models/best.pt`
268
- 5. Set hardware: GPU (T4 or better)
269
-
270
- ### 4. Environment Variables (if needed)
271
- ```
272
- HF_TOKEN=your_token_here
273
- ```
274
-
275
  ## Performance
276
 
277
- - **Processing time:** ~3-5 seconds per invoice
278
- - **Cost per invoice:** ~$0.0005 (GPU time)
279
- - **Batch processing:** Supported via `/extract_batch`
280
  - **GPU Memory:** 8GB minimum
 
80
  -F "doc_id=invoice_001"
81
  ```
82
 
 
 
 
83
 
 
 
 
 
 
 
 
84
 
85
  **Response:**
86
  ```json
 
109
  **Parameters:**
110
  - `files` (required): Array of image files
111
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
112
  ## Output Format
113
 
114
  Results saved to `sample_output/result.json`:
 
130
  }
131
  ```
132
 
 
 
 
 
 
 
 
 
 
 
 
133
 
134
  Range: 0.0 to 1.0 (higher is better)
135
 
 
140
  cost_usd = (0.5 * processing_time_sec) / 3600
141
  ```
142
 
143
+ Assumes **$0.60 per GPU hour**
144
 
145
  **Typical costs:**
146
  - Per invoice: ~$0.002
 
 
147
 
148
  ## Models
149
 
 
152
 
153
  ## GPU Requirements
154
 
155
+ - **Minimum:** 10 GB VRAM
 
 
 
 
156
 
157
  ## Project Structure
158
 
 
172
  └── result.json # Sample output
173
  ```
174
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
175
  ## Performance
176
 
177
+ - **Processing time:** ~8 seconds per invoice
178
+ - **Cost per invoice:** ~$0.002 (GPU time)
 
179
  - **GPU Memory:** 8GB minimum