| --- |
| language: |
| - es |
| license: apache-2.0 |
| library_name: peft |
| base_model: Qwen/Qwen3-VL-4B-Instruct |
| tags: |
| - invoice-extraction |
| - ocr |
| - spanish |
| - lora |
| - vision |
| - finance |
| pipeline_tag: image-to-text |
| --- |
| |
| # diffu-0.2 — Spanish Invoice Data Extractor (Vision) |
|
|
| **diffu-0.2** is a fine-tuned vision-language model for structured data extraction from Spanish invoice images. Built by [V10 Labs](https://v10labs.com), it extracts supplier details, tax IDs, amounts, and dates from invoice photographs and scans. |
|
|
| ## Performance |
|
|
| | Model | Accuracy | Type | |
| |-------|----------|------| |
| | **diffu-0.2 (this model)** | **93.39%** | Fine-tuned, vision | |
| | diffu-0.1 (V10 Labs) | 92.82% | Fine-tuned, text-only | |
| | Claude Sonnet 4.6 | 61.6% | Generalist, zero-shot | |
| | Qwen3-VL-4B (base) | 54.4% | Generalist, zero-shot | |
|
|
| ### Per-Field Accuracy |
|
|
| | Field | Accuracy | |
| |-------|----------| |
| | supplier | 92.06% | |
| | supplier_cif | 94.12% | |
| | invoice_number | 91.35% | |
| | date | 95.33% | |
| | subtotal | 92.06% | |
| | tax_total | 89.25% | |
| | total | 92.99% | |
| | doc_type | 100.00% | |
|
|
| ## Model Details |
|
|
| - **Base model**: Qwen/Qwen3-VL-4B-Instruct |
| - **Method**: LoRA (r=64, alpha=128) |
| - **Target modules**: q_proj, v_proj, k_proj, o_proj, gate_proj, up_proj, down_proj |
| - **Training**: 2 epochs, LR=1e-4, effective batch size 16 |
| - **Image resolution**: 256-1280 × 28 × 28 pixels |
| - **Adapter size**: 504 MB |
| - **Peak VRAM**: 22.57 GB (training), ~10 GB (inference) |
| - **Parse failures**: 0% |
| |
| ## Output Format |
| |
| |
| |
| ## Usage |
| |
| |
| |
| ## About V10 Labs |
| |
| V10 Labs builds AI-powered financial intelligence for SMBs in Spain. We train purpose-built models that outperform general-purpose LLMs on domain-specific tasks like invoice processing, accounting classification, and financial analysis. |
| |
| [v10labs.com](https://v10labs.com) |
| |