diffu-0.2 — Spanish Invoice Data Extractor (Vision)

diffu-0.2 is a fine-tuned vision-language model for structured data extraction from Spanish invoice images. Built by V10 Labs, it extracts supplier details, tax IDs, amounts, and dates from invoice photographs and scans.

Performance

Model	Accuracy	Type
diffu-0.2 (this model)	93.39%	Fine-tuned, vision
diffu-0.1 (V10 Labs)	92.82%	Fine-tuned, text-only
Claude Sonnet 4.6	61.6%	Generalist, zero-shot
Qwen3-VL-4B (base)	54.4%	Generalist, zero-shot

Per-Field Accuracy

Field	Accuracy
supplier	92.06%
supplier_cif	94.12%
invoice_number	91.35%
date	95.33%
subtotal	92.06%
tax_total	89.25%
total	92.99%
doc_type	100.00%

Model Details

Base model: Qwen/Qwen3-VL-4B-Instruct
Method: LoRA (r=64, alpha=128)
Target modules: q_proj, v_proj, k_proj, o_proj, gate_proj, up_proj, down_proj
Training: 2 epochs, LR=1e-4, effective batch size 16
Image resolution: 256-1280 × 28 × 28 pixels
Adapter size: 504 MB
Peak VRAM: 22.57 GB (training), ~10 GB (inference)
Parse failures: 0%

Output Format

Usage

About V10 Labs

V10 Labs builds AI-powered financial intelligence for SMBs in Spain. We train purpose-built models that outperform general-purpose LLMs on domain-specific tasks like invoice processing, accounting classification, and financial analysis.

v10labs.com

Downloads last month: 3

Model tree for V10LabsAI/diffu-0.2-adapter

Base model

Qwen/Qwen3-VL-4B-Instruct

Adapter

(72)

this model