Spaces:

quantumbit
/

invoice_extractor

Paused

App Files Files Community

github-actions[bot] commited on Feb 10

Commit

aeb681f

1 Parent(s): 4dbceee

Sync from GitHub: 67c5ee67bf66d7b77be5e2ffbfaa22681c2e0ebf

Browse files

Files changed (2) hide show

README.md +129 -111
README_git.md +4 -105

README.md CHANGED Viewed

@@ -1,5 +1,5 @@
 ---
-title: Invoice Information Extractor
 emoji: 📄
 colorFrom: blue
 colorTo: green
@@ -9,164 +9,182 @@ license: mit
 app_port: 7860
 ---
-# Invoice Information Extractor 🧾
-A complete full-stack application for extracting information from invoices using AI. Features a modern React frontend and powerful FastAPI backend with YOLOv8 and PaddleOCR.
-## ✨ Features
-### Frontend (React + Vite)
-- 📁 **Drag-and-drop file upload** with multi-file support
-- 📄 **PDF to Image conversion** (automatically converts multi-page PDFs)
-- 🔄 **Batch processing** with real-time progress tracking
-- 🎨 **Visual detection** of signatures and stamps with bounding boxes
-- 📱 **Responsive design** that works on all devices
-- ⚡ **Fast and modern** UI with Tailwind CSS
-### Backend (FastAPI + AI Models)
-- 🤖 **YOLOv8 object detection** for signatures and stamps
-- 📝 **PaddleOCR** for text extraction
-- 🚀 **High-performance API** with async support
-- 📊 **Batch processing** capabilities
-- 🔒 **CORS enabled** for frontend integration
-- 📚 **Interactive API docs** at /docs
-## 🎯 Quick Start
-### Option 1: Automated Setup (Windows)
-Run the automated setup script:
-```bash
-setup.bat
-```
-Then start both servers:
 ```bash
-start.bat
 ```
-### Option 2: Manual Setup
-#### Backend Setup
-```bash
-# Install dependencies
-pip install -r requirements.txt
-# Start the server
 python app.py
 ```
-Backend runs on: http://localhost:7860
-#### Frontend Setup
-```bash
-# Navigate to frontend
-cd frontend
-# Install dependencies
-npm install
-# Create environment file
-cp .env.example .env
-# Start development server
-npm run dev
 ```
-Frontend runs on: http://localhost:3000
-## 🖥️ Usage
-1. **Open the application** at http://localhost:3000
-2. **Upload files** by dragging and dropping or clicking to browse
-   - Supports: JPEG, PNG, GIF, WEBP images and PDF files
-3. **Click "Process"** to start extraction
-4. **View results** with:
-   - Extracted text from the invoice
-   - Visual detection of signatures (red boxes)
-   - Visual detection of stamps (blue boxes)
-   - Coordinates and metadata
-## 🔌 API Endpoints
-### POST /process-invoice
-Process a single invoice image
-**Request:**
 ```bash
-curl -X POST "http://localhost:7860/process-invoice" \
-  -F "file=@invoice.jpg"
 ```
 **Response:**
 ```json
 {
-  "extracted_text": "Invoice details...",
-  "signature_coords": [[x1, y1, x2, y2]],
-  "stamp_coords": [[x1, y1, x2, y2]],
-  "doc_id": "invoice",
-  "processing_time": 2.5
 }
 ```
-### GET /docs
-Interactive API documentation (Swagger UI)
-### GET /health
-Health check endpoint
-## 🛠️ Technology Stack
-### Frontend
-- **React 18** - Modern UI library
-- **Vite** - Lightning-fast build tool
-- **Tailwind CSS** - Utility-first CSS
-- **PDF.js** - PDF rendering
-- **Axios** - HTTP client
-### Backend
-- **FastAPI** - Modern Python web framework
-- **YOLOv8** - State-of-the-art object detection
-- **PaddleOCR** - Multilingual OCR
-- **Uvicorn** - ASGI server
-## 📖 Documentation
-- [Setup Guide](SETUP_GUIDE.md) - Detailed setup instructions
-- [Frontend Architecture](frontend/ARCHITECTURE.md) - Frontend technical details
-- [API Documentation](http://localhost:7860/docs) - Interactive API docs (when running)
-## 🚀 Deployment
-### Hugging Face Spaces
-The application is fully configured for deployment to Hugging Face Spaces:
-1. **Build frontend**:
-   ```bash
-   build-frontend.bat
-   ```
-2. **Push to GitHub** (auto-deploys via GitHub Actions):
-   ```bash
-   git add .
-   git commit -m "Deploy to HF"
-   git push origin main
-   ```
-See [HF_DEPLOYMENT_READY.md](HF_DEPLOYMENT_READY.md) and [DEPLOYMENT.md](DEPLOYMENT.md) for detailed instructions.
-### Docker
-```bash
-docker build -t invoice-extractor .
-docker run -p 7860:7860 invoice-extractor
-```
-The Dockerfile automatically builds the frontend during the build process.
-## 📄 License
-This project is licensed under the MIT License.
----
-**Made with ❤️ using AI and modern web technologies**

 ---
+title: Tractor Invoice Information Extractor
 emoji: 📄
 colorFrom: blue
 colorTo: green
 app_port: 7860
 ---
+# Invoice Information Extractor API
+Extract structured information from Indian tractor invoices using AI-powered REST API.
+## What It Does
+Combines **YOLO** (signature/stamp detection) + **Qwen2.5-VL** (text extraction) to extract:
+- Dealer name
+- Model name
+- Horse power
+- Asset cost
+- Signature presence & location
+- Stamp presence & location
+## Architecture
+### Production (Hugging Face Deployment)
+- **FastAPI server** with REST endpoints
+- **Models loaded on startup** and cached in memory
+- **YOLO model** stored locally in `utils/models/best.pt`
+- **Qwen2.5-VL** downloaded from Hugging Face on first run (not stored locally)
+### Key Components
+- `app.py` - FastAPI server with endpoints
+- `model_manager.py` - Handles model loading and caching
+- `inference.py` - Processing pipeline and validation
+- `config.py` - Configuration settings
+- `executable.py` - Legacy CLI interface (deprecated)
+## Installation
 ```bash
+pip install -r requirements.txt
 ```
+**Requirements:** Python 3.10+, CUDA GPU (8GB+ VRAM)
+## Running the Server
+### Local Development
+```bash
 python app.py
 ```
+Server runs on `http://localhost:7860`
+### Production (Hugging Face Spaces)
+```bash
+uvicorn app:app --host 0.0.0.0 --port 7860
+```
+## API Endpoints
+### 1. Health Check
+```bash
+GET /health
 ```
+**Response:**
+```json
+{
+  "status": "healthy",
+  "models_loaded": true
+}
+```
+### 2. Extract Single Invoice
+```bash
+POST /extract
+```
+**Parameters:**
+- `file` (required): Image file (JPG, PNG, JPEG)
+- `doc_id` (optional): Document identifier
+**Example (cURL):**
 ```bash
+curl -X POST "http://localhost:7860/extract" \
+  -F "file=@invoice_001.png" \
+  -F "doc_id=invoice_001"
 ```
 **Response:**
 ```json
 {
+  "doc_id": "invoice_001",
+  "fields": {
+    "dealer_name": "ABC Tractors Pvt Ltd",
+    "model_name": "Mahindra 575 DI",
+    "horse_power": 50,
+    "asset_cost": 525000,
+    "signature": {"present": true, "bbox": [100, 200, 300, 250]},
+    "stamp": {"present": true, "bbox": [400, 500, 500, 550]}
+  },
+  "confidence": 0.89,
+  "processing_time_sec": 3.8,
+  "cost_estimate_usd": 0.000528,
+  "warnings": null
 }
 ```
+### 3. Extract Multiple Invoices (Batch)
+```bash
+POST /extract_batch
+```
+**Parameters:**
+- `files` (required): Array of image files
+## Output Format
+Results saved to `sample_output/result.json`:
+```json
+{
+  "doc_id": "invoice_001",
+  "fields": {
+    "dealer_name": "ABC Tractors Pvt Ltd",
+    "model_name": "Mahindra 575 DI",
+    "horse_power": 50,
+    "asset_cost": 525000,
+    "signature": {"present": true, "bbox": [100, 200, 300, 250]},
+    "stamp": {"present": true, "bbox": [400, 500, 500, 550]}
+  },
+  "confidence": 0.89,
+  "processing_time_sec": 3.8,
+  "cost_estimate_usd": 0.000528
+}
+```
+Range: 0.0 to 1.0 (higher is better)
+## Cost Calculation
+**Formula:**
+```
+cost_usd = (0.5 * processing_time_sec) / 3600
+```
+Assumes **$0.60 per GPU hour**
+**Typical costs:**
+- Per invoice: ~$0.002
+## Models
+- **YOLO:** Signature/stamp detection (`best.pt`)
+- **Qwen2.5-VL-7B:** Text extraction (4-bit quantized)
+## GPU Requirements
+- **Minimum:** 10 GB VRAM
+## Project Structure
+```
+INVOICE_INFO_EXTRACTOR/
+├── app.py                 # FastAPI server (main entry point)
+├── model_manager.py       # Model loading and caching
+├── inference.py           # Processing pipeline and validation
+├── config.py              # Configuration settings
+├── requirements.txt
+├── README.md
+├── executable.py          # Legacy CLI (deprecated)
+├── utils/
+│   └── models/
+│       └── best.pt        # YOLO model (stored locally)
+└── sample_output/
+    └── result.json        # Sample output
+```
+## Performance
+- **Processing time:** ~8 seconds per invoice
+- **Cost per invoice:** ~$0.002 (GPU time)
+- **GPU Memory:** 8GB minimum

README_git.md CHANGED Viewed

@@ -80,17 +80,7 @@ curl -X POST "http://localhost:7860/extract" \
   -F "doc_id=invoice_001"
 ```
-**Example (Python):**
-```python
-import requests
-url = "http://localhost:7860/extract"
-files = {"file": open("invoice_001.png", "rb")}
-data = {"doc_id": "invoice_001"}
-response = requests.post(url, files=files, data=data)
-print(response.json())
-```
 **Response:**
 ```json
@@ -119,27 +109,6 @@ POST /extract_batch
 **Parameters:**
 - `files` (required): Array of image files
-**Example (Python):**
-```python
-import requests
-url = "http://localhost:7860/extract_batch"
-files = [
-    ("files", open("invoice_001.png", "rb")),
-    ("files", open("invoice_002.png", "rb"))
-]
-response = requests.post(url, files=files)
-print(response.json())
-```
-### 4. Interactive Documentation
-```bash
-GET /docs
-```
-Visit `http://localhost:7860/docs` for interactive API documentation (Swagger UI).
 ## Output Format
 Results saved to `sample_output/result.json`:
@@ -161,17 +130,6 @@ Results saved to `sample_output/result.json`:
 }
 ```
-## Confidence Calculation
-Overall confidence is the **average** of:
-1. **Field validation confidence** - From dealer_name, model_name, horse_power, asset_cost validation
-2. **Signature detection confidence** - YOLO confidence score (if signature present)
-3. **Stamp detection confidence** - YOLO confidence score (if stamp present)
-**Formula:**
-```
-confidence = (field_conf + signature_conf + stamp_conf) / 3
-```
 Range: 0.0 to 1.0 (higher is better)
@@ -182,12 +140,10 @@ Range: 0.0 to 1.0 (higher is better)
 cost_usd = (0.5 * processing_time_sec) / 3600
 ```
-Assumes **$0.50 per GPU hour**
 **Typical costs:**
 - Per invoice: ~$0.002
-- 100 invoices: ~$0.2
-- Processing time: ~15 seconds
 ## Models
@@ -196,11 +152,7 @@ Assumes **$0.50 per GPU hour**
 ## GPU Requirements
-- **Minimum:** 8GB VRAM
-## Troubleshooting
-**Debug mode:** Use `--debug` flag to see raw VLM output and parsed JSON
 ## Project Structure
@@ -220,61 +172,8 @@ INVOICE_INFO_EXTRACTOR/
     └── result.json        # Sample output
 ```
-## Deployment on Hugging Face Spaces
-### 1. Create `Dockerfile` (optional)
-```dockerfile
-FROM python:3.10-slim
-WORKDIR /app
-# Install system dependencies
-RUN apt-get update && apt-get install -y \
-    git \
-    libgl1-mesa-glx \
-    libglib2.0-0 \
-    && rm -rf /var/lib/apt/lists/*
-# Copy requirements and install
-COPY requirements.txt .
-RUN pip install --no-cache-dir -r requirements.txt
-# Copy application files
-COPY . .
-# Expose port
-EXPOSE 7860
-# Run the application
-CMD ["python", "app.py"]
-```
-### 2. Create `.gitignore`
-```
-__pycache__/
-*.pyc
-.env
-sample_output/
-*.pt.backup
-venv/
-.vscode/
-```
-### 3. Upload to Hugging Face
-1. Create new Space on Hugging Face
-2. Select "Docker" or "Gradio" SDK
-3. Upload files: `app.py`, `model_manager.py`, `inference.py`, `config.py`, `requirements.txt`
-4. Upload YOLO model: `utils/models/best.pt`
-5. Set hardware: GPU (T4 or better)
-### 4. Environment Variables (if needed)
-```
-HF_TOKEN=your_token_here
-```
 ## Performance
-- **Processing time:** ~3-5 seconds per invoice
-- **Cost per invoice:** ~$0.0005 (GPU time)
-- **Batch processing:** Supported via `/extract_batch`
 - **GPU Memory:** 8GB minimum

   -F "doc_id=invoice_001"
 ```
 **Response:**
 ```json
 **Parameters:**
 - `files` (required): Array of image files
 ## Output Format
 Results saved to `sample_output/result.json`:
 }
 ```
 Range: 0.0 to 1.0 (higher is better)
 cost_usd = (0.5 * processing_time_sec) / 3600
 ```
+Assumes **$0.60 per GPU hour**
 **Typical costs:**
 - Per invoice: ~$0.002
 ## Models
 ## GPU Requirements
+- **Minimum:** 10 GB VRAM
 ## Project Structure
     └── result.json        # Sample output
 ```
 ## Performance
+- **Processing time:** ~8 seconds per invoice
+- **Cost per invoice:** ~$0.002 (GPU time)
 - **GPU Memory:** 8GB minimum