Upload README.md with huggingface_hub

d3d3916 verified 4 days ago

1.79 kB

	---
	language:
	- es
	license: apache-2.0
	library_name: peft
	base_model: Qwen/Qwen3-VL-4B-Instruct
	tags:
	- invoice-extraction
	- ocr
	- spanish
	- lora
	- vision
	- finance
	pipeline_tag: image-to-text
	---

	# diffu-0.2 — Spanish Invoice Data Extractor (Vision)

	diffu-0.2 is a fine-tuned vision-language model for structured data extraction from Spanish invoice images. Built by [V10 Labs](https://v10labs.com), it extracts supplier details, tax IDs, amounts, and dates from invoice photographs and scans.

	## Performance

	\| Model \| Accuracy \| Type \|
	\|-------\|----------\|------\|
	\| diffu-0.2 (this model) \| 93.39% \| Fine-tuned, vision \|
	\| diffu-0.1 (V10 Labs) \| 92.82% \| Fine-tuned, text-only \|
	\| Claude Sonnet 4.6 \| 61.6% \| Generalist, zero-shot \|
	\| Qwen3-VL-4B (base) \| 54.4% \| Generalist, zero-shot \|

	### Per-Field Accuracy

	\| Field \| Accuracy \|
	\|-------\|----------\|
	\| supplier \| 92.06% \|
	\| supplier_cif \| 94.12% \|
	\| invoice_number \| 91.35% \|
	\| date \| 95.33% \|
	\| subtotal \| 92.06% \|
	\| tax_total \| 89.25% \|
	\| total \| 92.99% \|
	\| doc_type \| 100.00% \|

	## Model Details

	- Base model: Qwen/Qwen3-VL-4B-Instruct
	- Method: LoRA (r=64, alpha=128)
	- Target modules: q_proj, v_proj, k_proj, o_proj, gate_proj, up_proj, down_proj
	- Training: 2 epochs, LR=1e-4, effective batch size 16
	- Image resolution: 256-1280 × 28 × 28 pixels
	- Adapter size: 504 MB
	- Peak VRAM: 22.57 GB (training), ~10 GB (inference)
	- Parse failures: 0%

	## Output Format



	## Usage



	## About V10 Labs

	V10 Labs builds AI-powered financial intelligence for SMBs in Spain. We train purpose-built models that outperform general-purpose LLMs on domain-specific tasks like invoice processing, accounting classification, and financial analysis.

	[v10labs.com](https://v10labs.com)

	---
	language:
	- es
	license: apache-2.0
	library_name: peft
	base_model: Qwen/Qwen3-VL-4B-Instruct
	tags:
	- invoice-extraction
	- ocr
	- spanish
	- lora
	- vision
	- finance
	pipeline_tag: image-to-text
	---

	# diffu-0.2 — Spanish Invoice Data Extractor (Vision)

	diffu-0.2 is a fine-tuned vision-language model for structured data extraction from Spanish invoice images. Built by [V10 Labs](https://v10labs.com), it extracts supplier details, tax IDs, amounts, and dates from invoice photographs and scans.

	## Performance

	\| Model \| Accuracy \| Type \|
	\|-------\|----------\|------\|
	\| diffu-0.2 (this model) \| 93.39% \| Fine-tuned, vision \|
	\| diffu-0.1 (V10 Labs) \| 92.82% \| Fine-tuned, text-only \|
	\| Claude Sonnet 4.6 \| 61.6% \| Generalist, zero-shot \|
	\| Qwen3-VL-4B (base) \| 54.4% \| Generalist, zero-shot \|

	### Per-Field Accuracy

	\| Field \| Accuracy \|
	\|-------\|----------\|
	\| supplier \| 92.06% \|
	\| supplier_cif \| 94.12% \|
	\| invoice_number \| 91.35% \|
	\| date \| 95.33% \|
	\| subtotal \| 92.06% \|
	\| tax_total \| 89.25% \|
	\| total \| 92.99% \|
	\| doc_type \| 100.00% \|

	## Model Details

	- Base model: Qwen/Qwen3-VL-4B-Instruct
	- Method: LoRA (r=64, alpha=128)
	- Target modules: q_proj, v_proj, k_proj, o_proj, gate_proj, up_proj, down_proj
	- Training: 2 epochs, LR=1e-4, effective batch size 16
	- Image resolution: 256-1280 × 28 × 28 pixels
	- Adapter size: 504 MB
	- Peak VRAM: 22.57 GB (training), ~10 GB (inference)
	- Parse failures: 0%

	## Output Format



	## Usage



	## About V10 Labs

	V10 Labs builds AI-powered financial intelligence for SMBs in Spain. We train purpose-built models that outperform general-purpose LLMs on domain-specific tasks like invoice processing, accounting classification, and financial analysis.

	[v10labs.com](https://v10labs.com)