Spaces:
Runtime error
Runtime error
metadata
title: LLaVA Invoice Extractor
emoji: π§Ύ
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
LLaVA-OneVision Invoice Extraction Service
This is a Hugging Face Space that runs LLaVA-OneVision-1.5-4B-Instruct for extracting structured data from invoice images using vision-language understanding.
π Features
- Vision-based extraction: Analyzes invoice images directly without traditional OCR
- Vendor-specific prompts: Optimized extraction for different invoice formats (A1/Burlington, Costco, etc.)
- Structured JSON output: Returns invoice data in standardized format
- FastAPI service: REST API compatible with existing invoice processing pipeline
π API Endpoints
POST /extract_invoice
Extract invoice data from an image.
Request body:
{
"image": "base64_encoded_image_string",
"vendor_id": "A1 Cash and Carry_Fisico",
"use_validation": false
}
Response:
{
"status": "success",
"data": {
"issuer": "Burlington Cash and Carry",
"date": "2024-01-15",
"transaction_id": "BL123456",
"items": [
{
"sku": "ALU104",
"description": "Product description",
"quantity": 2.0,
"unit_price": 10.50,
"amount": 21.00,
"tax_code": "H"
}
],
"subtotal": 100.00,
"hst": 13.00,
"total": 113.00
},
"raw_output": "..."
}
GET /
Health check endpoint.
GET /health
Service health status.
π§ Deployment to Hugging Face Spaces
Option 1: Using the Web Interface
- Go to https://huggingface.co/spaces
- Click "Create new Space"
- Settings:
- Name:
invoice-llava-extractor(or your preferred name) - License: Apache 2.0
- SDK: Docker
- Hardware: GPU T4 (free) or better
- Name:
- Click "Create Space"
- Clone the repository:
git clone https://huggingface.co/spaces/YOUR_USERNAME/invoice-llava-extractor cd invoice-llava-extractor - Copy all files from this directory into the cloned repo
- Commit and push:
git add . git commit -m "Initial LLaVA service deployment" git push - Wait for the Space to build (5-15 minutes)
Option 2: Using the Provided Script
Run the deployment script (after customizing it):
chmod +x upload_llava_to_huggingface.sh
./upload_llava_to_huggingface.sh
π§ͺ Testing
Test locally (requires GPU)
# Install dependencies
pip install -r requirements.txt
# Run the service
python3 app.py
# Test with curl (in another terminal)
curl -X POST http://localhost:7860/extract_invoice \
-H "Content-Type: application/json" \
-d '{
"image": "BASE64_ENCODED_IMAGE",
"vendor_id": "A1 Cash and Carry_Fisico"
}'
Test deployed Space
import requests
import base64
# Load and encode image
with open("invoice.jpg", "rb") as f:
image_b64 = base64.b64encode(f.read()).decode()
# Make request
response = requests.post(
"https://YOUR_USERNAME-invoice-llava-extractor.hf.space/extract_invoice",
json={
"image": image_b64,
"vendor_id": "A1 Cash and Carry_Fisico"
}
)
print(response.json())
βοΈ Configuration
Environment Variables
MODEL_NAME: HuggingFace model ID (default:lmms-lab/LLaVA-OneVision-1.5-4B-Instruct)MAX_NEW_TOKENS: Maximum tokens to generate (default:2048)PORT: Service port (default:7860)
Supported Vendors
A1 Cash and Carry_Fisico- Burlington Cash and CarryCostco_Formato1- Costco invoices format 1Costco_Formato2- Costco invoices format 2Default- Generic invoice extraction
π Model Information
- Model: LLaVA-OneVision-1.5-4B-Instruct
- Size: 4B parameters (optimized for speed on free GPU)
- Capabilities: Image understanding, structured data extraction, multilingual
- License: Apache 2.0
π Integration
This service is designed to integrate with the main invoice processing application. After deployment, update your .env file:
LLAVA_SERVICE_URL=https://YOUR_USERNAME-invoice-llava-extractor.hf.space
π Notes
- First request may be slow due to model loading (cold start)
- GPU T4 can process an invoice in 10-30 seconds
- For better performance, consider upgrading to GPU A10G or A100
- The 4B model is chosen for balance between speed and accuracy on free tier
π Troubleshooting
Space crashes or out of memory:
- Try reducing
MAX_NEW_TOKENSto 1024 - Ensure GPU hardware is selected in Space settings
JSON parsing errors:
- Check the
raw_outputfield in response - Model may need better prompting for specific invoice types
Slow inference:
- Expected on T4 GPU (free tier)
- Consider upgrading hardware or using smaller model
π License
Apache 2.0