sgonzalezu's picture
Deploy LLaVA invoice extraction service
853e9dd
metadata
title: LLaVA Invoice Extractor
emoji: 🧾
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false

LLaVA-OneVision Invoice Extraction Service

This is a Hugging Face Space that runs LLaVA-OneVision-1.5-4B-Instruct for extracting structured data from invoice images using vision-language understanding.

πŸš€ Features

  • Vision-based extraction: Analyzes invoice images directly without traditional OCR
  • Vendor-specific prompts: Optimized extraction for different invoice formats (A1/Burlington, Costco, etc.)
  • Structured JSON output: Returns invoice data in standardized format
  • FastAPI service: REST API compatible with existing invoice processing pipeline

πŸ“‹ API Endpoints

POST /extract_invoice

Extract invoice data from an image.

Request body:

{
  "image": "base64_encoded_image_string",
  "vendor_id": "A1 Cash and Carry_Fisico",
  "use_validation": false
}

Response:

{
  "status": "success",
  "data": {
    "issuer": "Burlington Cash and Carry",
    "date": "2024-01-15",
    "transaction_id": "BL123456",
    "items": [
      {
        "sku": "ALU104",
        "description": "Product description",
        "quantity": 2.0,
        "unit_price": 10.50,
        "amount": 21.00,
        "tax_code": "H"
      }
    ],
    "subtotal": 100.00,
    "hst": 13.00,
    "total": 113.00
  },
  "raw_output": "..."
}

GET /

Health check endpoint.

GET /health

Service health status.

πŸ”§ Deployment to Hugging Face Spaces

Option 1: Using the Web Interface

  1. Go to https://huggingface.co/spaces
  2. Click "Create new Space"
  3. Settings:
    • Name: invoice-llava-extractor (or your preferred name)
    • License: Apache 2.0
    • SDK: Docker
    • Hardware: GPU T4 (free) or better
  4. Click "Create Space"
  5. Clone the repository:
    git clone https://huggingface.co/spaces/YOUR_USERNAME/invoice-llava-extractor
    cd invoice-llava-extractor
    
  6. Copy all files from this directory into the cloned repo
  7. Commit and push:
    git add .
    git commit -m "Initial LLaVA service deployment"
    git push
    
  8. Wait for the Space to build (5-15 minutes)

Option 2: Using the Provided Script

Run the deployment script (after customizing it):

chmod +x upload_llava_to_huggingface.sh
./upload_llava_to_huggingface.sh

πŸ§ͺ Testing

Test locally (requires GPU)

# Install dependencies
pip install -r requirements.txt

# Run the service
python3 app.py

# Test with curl (in another terminal)
curl -X POST http://localhost:7860/extract_invoice \
  -H "Content-Type: application/json" \
  -d '{
    "image": "BASE64_ENCODED_IMAGE",
    "vendor_id": "A1 Cash and Carry_Fisico"
  }'

Test deployed Space

import requests
import base64

# Load and encode image
with open("invoice.jpg", "rb") as f:
    image_b64 = base64.b64encode(f.read()).decode()

# Make request
response = requests.post(
    "https://YOUR_USERNAME-invoice-llava-extractor.hf.space/extract_invoice",
    json={
        "image": image_b64,
        "vendor_id": "A1 Cash and Carry_Fisico"
    }
)

print(response.json())

βš™οΈ Configuration

Environment Variables

  • MODEL_NAME: HuggingFace model ID (default: lmms-lab/LLaVA-OneVision-1.5-4B-Instruct)
  • MAX_NEW_TOKENS: Maximum tokens to generate (default: 2048)
  • PORT: Service port (default: 7860)

Supported Vendors

  • A1 Cash and Carry_Fisico - Burlington Cash and Carry
  • Costco_Formato1 - Costco invoices format 1
  • Costco_Formato2 - Costco invoices format 2
  • Default - Generic invoice extraction

πŸ“Š Model Information

  • Model: LLaVA-OneVision-1.5-4B-Instruct
  • Size: 4B parameters (optimized for speed on free GPU)
  • Capabilities: Image understanding, structured data extraction, multilingual
  • License: Apache 2.0

πŸ”— Integration

This service is designed to integrate with the main invoice processing application. After deployment, update your .env file:

LLAVA_SERVICE_URL=https://YOUR_USERNAME-invoice-llava-extractor.hf.space

πŸ“ Notes

  • First request may be slow due to model loading (cold start)
  • GPU T4 can process an invoice in 10-30 seconds
  • For better performance, consider upgrading to GPU A10G or A100
  • The 4B model is chosen for balance between speed and accuracy on free tier

πŸ› Troubleshooting

Space crashes or out of memory:

  • Try reducing MAX_NEW_TOKENS to 1024
  • Ensure GPU hardware is selected in Space settings

JSON parsing errors:

  • Check the raw_output field in response
  • Model may need better prompting for specific invoice types

Slow inference:

  • Expected on T4 GPU (free tier)
  • Consider upgrading hardware or using smaller model

πŸ“„ License

Apache 2.0