Spaces:

quantumbit
/

invoice_extractor

Paused

App Files Files Community

github-actions[bot] commited on Feb 4

Commit

a571c24

1 Parent(s): 6d9fd48

Sync from GitHub: 81ad14fcc4be611ad6ac0e65b151cdf9225c7ee9

Browse files

Files changed (2) hide show

.gitignore +2 -1
README_git.md +280 -0

.gitignore CHANGED Viewed

@@ -28,8 +28,9 @@ htmlcov/
 .ipynb_checkpoints/
 *.md
-!README_HF.md
 !README.md
 test*
 executable.py
 client_example.py

 .ipynb_checkpoints/
 *.md
+!README_git.md
 !README.md
 test*
 executable.py
 client_example.py
+Docs

README_git.md ADDED Viewed

	@@ -0,0 +1,280 @@

+# Invoice Information Extractor API
+Extract structured information from Indian tractor invoices using AI-powered REST API.
+## What It Does
+Combines **YOLO** (signature/stamp detection) + **Qwen2.5-VL** (text extraction) to extract:
+- Dealer name
+- Model name
+- Horse power
+- Asset cost
+- Signature presence & location
+- Stamp presence & location
+## Architecture
+### Production (Hugging Face Deployment)
+- **FastAPI server** with REST endpoints
+- **Models loaded on startup** and cached in memory
+- **YOLO model** stored locally in `utils/models/best.pt`
+- **Qwen2.5-VL** downloaded from Hugging Face on first run (not stored locally)
+### Key Components
+- `app.py` - FastAPI server with endpoints
+- `model_manager.py` - Handles model loading and caching
+- `inference.py` - Processing pipeline and validation
+- `config.py` - Configuration settings
+- `executable.py` - Legacy CLI interface (deprecated)
+## Installation
+```bash
+pip install -r requirements.txt
+```
+**Requirements:** Python 3.10+, CUDA GPU (8GB+ VRAM)
+## Running the Server
+### Local Development
+```bash
+python app.py
+```
+Server runs on `http://localhost:7860`
+### Production (Hugging Face Spaces)
+```bash
+uvicorn app:app --host 0.0.0.0 --port 7860
+```
+## API Endpoints
+### 1. Health Check
+```bash
+GET /health
+```
+**Response:**
+```json
+{
+  "status": "healthy",
+  "models_loaded": true
+}
+```
+### 2. Extract Single Invoice
+```bash
+POST /extract
+```
+**Parameters:**
+- `file` (required): Image file (JPG, PNG, JPEG)
+- `doc_id` (optional): Document identifier
+**Example (cURL):**
+```bash
+curl -X POST "http://localhost:7860/extract" \
+  -F "file=@invoice_001.png" \
+  -F "doc_id=invoice_001"
+```
+**Example (Python):**
+```python
+import requests
+url = "http://localhost:7860/extract"
+files = {"file": open("invoice_001.png", "rb")}
+data = {"doc_id": "invoice_001"}
+response = requests.post(url, files=files, data=data)
+print(response.json())
+```
+**Response:**
+```json
+{
+  "doc_id": "invoice_001",
+  "fields": {
+    "dealer_name": "ABC Tractors Pvt Ltd",
+    "model_name": "Mahindra 575 DI",
+    "horse_power": 50,
+    "asset_cost": 525000,
+    "signature": {"present": true, "bbox": [100, 200, 300, 250]},
+    "stamp": {"present": true, "bbox": [400, 500, 500, 550]}
+  },
+  "confidence": 0.89,
+  "processing_time_sec": 3.8,
+  "cost_estimate_usd": 0.000528,
+  "warnings": null
+}
+```
+### 3. Extract Multiple Invoices (Batch)
+```bash
+POST /extract_batch
+```
+**Parameters:**
+- `files` (required): Array of image files
+**Example (Python):**
+```python
+import requests
+url = "http://localhost:7860/extract_batch"
+files = [
+    ("files", open("invoice_001.png", "rb")),
+    ("files", open("invoice_002.png", "rb"))
+]
+response = requests.post(url, files=files)
+print(response.json())
+```
+### 4. Interactive Documentation
+```bash
+GET /docs
+```
+Visit `http://localhost:7860/docs` for interactive API documentation (Swagger UI).
+## Output Format
+Results saved to `sample_output/result.json`:
+```json
+{
+  "doc_id": "invoice_001",
+  "fields": {
+    "dealer_name": "ABC Tractors Pvt Ltd",
+    "model_name": "Mahindra 575 DI",
+    "horse_power": 50,
+    "asset_cost": 525000,
+    "signature": {"present": true, "bbox": [100, 200, 300, 250]},
+    "stamp": {"present": true, "bbox": [400, 500, 500, 550]}
+  },
+  "confidence": 0.89,
+  "processing_time_sec": 3.8,
+  "cost_estimate_usd": 0.000528
+}
+```
+## Confidence Calculation
+Overall confidence is the **average** of:
+1. **Field validation confidence** - From dealer_name, model_name, horse_power, asset_cost validation
+2. **Signature detection confidence** - YOLO confidence score (if signature present)
+3. **Stamp detection confidence** - YOLO confidence score (if stamp present)
+**Formula:**
+```
+confidence = (field_conf + signature_conf + stamp_conf) / 3
+```
+Range: 0.0 to 1.0 (higher is better)
+## Cost Calculation
+**Formula:**
+```
+cost_usd = (0.5 * processing_time_sec) / 3600
+```
+Assumes **$0.50 per GPU hour**
+**Typical costs:**
+- Per invoice: ~$0.002
+- 100 invoices: ~$0.2
+- Processing time: ~15 seconds
+## Models
+- **YOLO:** Signature/stamp detection (`best.pt`)
+- **Qwen2.5-VL-7B:** Text extraction (4-bit quantized)
+## GPU Requirements
+- **Minimum:** 8GB VRAM
+## Troubleshooting
+**Debug mode:** Use `--debug` flag to see raw VLM output and parsed JSON
+## Project Structure
+```
+INVOICE_INFO_EXTRACTOR/
+├── app.py                 # FastAPI server (main entry point)
+├── model_manager.py       # Model loading and caching
+├── inference.py           # Processing pipeline and validation
+├── config.py              # Configuration settings
+├── requirements.txt
+├── README.md
+├── executable.py          # Legacy CLI (deprecated)
+├── utils/
+│   └── models/
+│       └── best.pt        # YOLO model (stored locally)
+└── sample_output/
+    └── result.json        # Sample output
+```
+## Deployment on Hugging Face Spaces
+### 1. Create `Dockerfile` (optional)
+```dockerfile
+FROM python:3.10-slim
+WORKDIR /app
+# Install system dependencies
+RUN apt-get update && apt-get install -y \
+    git \
+    libgl1-mesa-glx \
+    libglib2.0-0 \
+    && rm -rf /var/lib/apt/lists/*
+# Copy requirements and install
+COPY requirements.txt .
+RUN pip install --no-cache-dir -r requirements.txt
+# Copy application files
+COPY . .
+# Expose port
+EXPOSE 7860
+# Run the application
+CMD ["python", "app.py"]
+```
+### 2. Create `.gitignore`
+```
+__pycache__/
+*.pyc
+.env
+sample_output/
+*.pt.backup
+venv/
+.vscode/
+```
+### 3. Upload to Hugging Face
+1. Create new Space on Hugging Face
+2. Select "Docker" or "Gradio" SDK
+3. Upload files: `app.py`, `model_manager.py`, `inference.py`, `config.py`, `requirements.txt`
+4. Upload YOLO model: `utils/models/best.pt`
+5. Set hardware: GPU (T4 or better)
+### 4. Environment Variables (if needed)
+```
+HF_TOKEN=your_token_here
+```
+## Performance
+- **Processing time:** ~3-5 seconds per invoice
+- **Cost per invoice:** ~$0.0005 (GPU time)
+- **Batch processing:** Supported via `/extract_batch`
+- **GPU Memory:** 8GB minimum