diffu-0.2 โ€” Spanish Invoice Data Extractor (Vision)

diffu-0.2 is a fine-tuned vision-language model for structured data extraction from Spanish invoice images. Built by V10 Labs, it extracts supplier details, tax IDs, amounts, and dates from invoice photographs and scans.

Performance

Model Accuracy Type
diffu-0.2 (this model) 93.39% Fine-tuned, vision
diffu-0.1 (V10 Labs) 92.82% Fine-tuned, text-only
Claude Sonnet 4.6 61.6% Generalist, zero-shot
Qwen3-VL-4B (base) 54.4% Generalist, zero-shot

Per-Field Accuracy

Field Accuracy
supplier 92.06%
supplier_cif 94.12%
invoice_number 91.35%
date 95.33%
subtotal 92.06%
tax_total 89.25%
total 92.99%
doc_type 100.00%

Model Details

  • Base model: Qwen/Qwen3-VL-4B-Instruct
  • Method: LoRA (r=64, alpha=128)
  • Target modules: q_proj, v_proj, k_proj, o_proj, gate_proj, up_proj, down_proj
  • Training: 2 epochs, LR=1e-4, effective batch size 16
  • Image resolution: 256-1280 ร— 28 ร— 28 pixels
  • Adapter size: 504 MB
  • Peak VRAM: 22.57 GB (training), ~10 GB (inference)
  • Parse failures: 0%

Output Format

Usage

About V10 Labs

V10 Labs builds AI-powered financial intelligence for SMBs in Spain. We train purpose-built models that outperform general-purpose LLMs on domain-specific tasks like invoice processing, accounting classification, and financial analysis.

v10labs.com

Downloads last month
13
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for V10LabsAI/diffu-0.2-adapter

Adapter
(21)
this model