Spaces:

sgonzalezu
/

invoice-llava-extractor

Runtime error

App Files Files Community

invoice-llava-extractor / README.md

sgonzalezu

Deploy LLaVA invoice extraction service

853e9dd 3 months ago

preview code

raw

history blame contribute delete

4.88 kB

metadata

title: LLaVA Invoice Extractor
emoji: 🧾
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false

LLaVA-OneVision Invoice Extraction Service

This is a Hugging Face Space that runs LLaVA-OneVision-1.5-4B-Instruct for extracting structured data from invoice images using vision-language understanding.

🚀 Features

Vision-based extraction: Analyzes invoice images directly without traditional OCR
Vendor-specific prompts: Optimized extraction for different invoice formats (A1/Burlington, Costco, etc.)
Structured JSON output: Returns invoice data in standardized format
FastAPI service: REST API compatible with existing invoice processing pipeline

📋 API Endpoints

POST `/extract_invoice`

Extract invoice data from an image.

Request body:

{
  "image": "base64_encoded_image_string",
  "vendor_id": "A1 Cash and Carry_Fisico",
  "use_validation": false
}

Response:

{
  "status": "success",
  "data": {
    "issuer": "Burlington Cash and Carry",
    "date": "2024-01-15",
    "transaction_id": "BL123456",
    "items": [
      {
        "sku": "ALU104",
        "description": "Product description",
        "quantity": 2.0,
        "unit_price": 10.50,
        "amount": 21.00,
        "tax_code": "H"
      }
    ],
    "subtotal": 100.00,
    "hst": 13.00,
    "total": 113.00
  },
  "raw_output": "..."
}

GET `/`

Health check endpoint.

GET `/health`

Service health status.

🔧 Deployment to Hugging Face Spaces

Option 1: Using the Web Interface

Go to https://huggingface.co/spaces
Click "Create new Space"
Settings:
- Name: invoice-llava-extractor (or your preferred name)
- License: Apache 2.0
- SDK: Docker
- Hardware: GPU T4 (free) or better
Click "Create Space"

Clone the repository:

git clone https://huggingface.co/spaces/YOUR_USERNAME/invoice-llava-extractor
cd invoice-llava-extractor

Copy all files from this directory into the cloned repo

Commit and push:

git add .
git commit -m "Initial LLaVA service deployment"
git push

Wait for the Space to build (5-15 minutes)

Option 2: Using the Provided Script

Run the deployment script (after customizing it):

chmod +x upload_llava_to_huggingface.sh
./upload_llava_to_huggingface.sh

🧪 Testing

Test locally (requires GPU)

# Install dependencies
pip install -r requirements.txt

# Run the service
python3 app.py

# Test with curl (in another terminal)
curl -X POST http://localhost:7860/extract_invoice \
  -H "Content-Type: application/json" \
  -d '{
    "image": "BASE64_ENCODED_IMAGE",
    "vendor_id": "A1 Cash and Carry_Fisico"
  }'

Test deployed Space

import requests
import base64

# Load and encode image
with open("invoice.jpg", "rb") as f:
    image_b64 = base64.b64encode(f.read()).decode()

# Make request
response = requests.post(
    "https://YOUR_USERNAME-invoice-llava-extractor.hf.space/extract_invoice",
    json={
        "image": image_b64,
        "vendor_id": "A1 Cash and Carry_Fisico"
    }
)

print(response.json())

⚙️ Configuration

Environment Variables

MODEL_NAME: HuggingFace model ID (default: lmms-lab/LLaVA-OneVision-1.5-4B-Instruct)
MAX_NEW_TOKENS: Maximum tokens to generate (default: 2048)
PORT: Service port (default: 7860)

Supported Vendors

A1 Cash and Carry_Fisico - Burlington Cash and Carry
Costco_Formato1 - Costco invoices format 1
Costco_Formato2 - Costco invoices format 2
Default - Generic invoice extraction

📊 Model Information

Model: LLaVA-OneVision-1.5-4B-Instruct
Size: 4B parameters (optimized for speed on free GPU)
Capabilities: Image understanding, structured data extraction, multilingual
License: Apache 2.0

🔗 Integration

This service is designed to integrate with the main invoice processing application. After deployment, update your .env file:

LLAVA_SERVICE_URL=https://YOUR_USERNAME-invoice-llava-extractor.hf.space

📝 Notes

First request may be slow due to model loading (cold start)
GPU T4 can process an invoice in 10-30 seconds
For better performance, consider upgrading to GPU A10G or A100
The 4B model is chosen for balance between speed and accuracy on free tier

🐛 Troubleshooting

Space crashes or out of memory:

Try reducing MAX_NEW_TOKENS to 1024
Ensure GPU hardware is selected in Space settings

JSON parsing errors:

Check the raw_output field in response
Model may need better prompting for specific invoice types

Slow inference:

Expected on T4 GPU (free tier)
Consider upgrading hardware or using smaller model

📄 License

Apache 2.0