Alawy21
/

Invoice_Extraction_Qwen2_2B_Finetuning

Image-Text-to-Text

Generated from Trainer

Model card Files Files and versions

Invoice_Extraction_Qwen2_2B_Finetuning / README.md

Alawy21's picture

Update README.md

d18c0fb verified 9 months ago

|

history blame contribute delete

3.09 kB

	---
	library_name: peft
	license: apache-2.0
	base_model:
	- Qwen/Qwen2-VL-2B-Instruct
	tags:
	- llama-factory
	- lora
	- generated_from_trainer
	- Qwen
	- Vl-model
	- fine-tuning
	- vision-model
	- multi-modal
	model-index:
	- name: models
	results: []
	datasets:
	- naver-clova-ix/cord-v2
	language:
	- en
	metrics:
	- accuracy
	- precision
	- recall
	- f1
	pipeline_tag: image-text-to-text
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# Qwen Fine Tuning Results
	<img src="./results.png" alt="Sample Invoice" width="auto"/>

	# models

	This model is a fine-tuned version of [Qwen/Qwen2-VL-2B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct) on the invoice_train dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.0481

	## Model description

	- he Qwen2 2B model has been fine-tuned on OCR-rich invoice data from the CORD-v2 dataset, allowing it to recognize both the content and layout of invoices effectively. The model outputs structured information directly, enabling downstream processing or integration into accounting systems.

	For each invoice image, the model identifies and extracts the following fields:

	- Menu Items

	- Item Prices

	- Subtotal Price

	- Total Price

	- Tax Amount

	- Cash Given

	- Change Amount

	## More Info
	- Base Model: Qwen2 2B — a large language model fine-tuned for vision-language tasks.

	- Fine-Tuning: Supervised learning on OCR + structure pairs from the CORD-v2 dataset.

	- Input: OCR-annotated invoice image data from the CORD-v2 dataset.

	- Output: Structured extraction of key financial fields in JSON format.



	## Training and evaluation data

	- Training Set: 800 samples Used to fine-tune the Qwen2 2B model on learning to extract key invoice components from OCR-text and layout information.

	- Evaluation Set: 100 samples Used to assess the model’s ability to generalize and accurately extract fields such as menu items, prices, subtotal, tax, cash, and change from unseen invoices.

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0001
	- train_batch_size: 1
	- eval_batch_size: 1
	- seed: 42
	- gradient_accumulation_steps: 4
	- total_train_batch_size: 4
	- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_ratio: 0.1
	- num_epochs: 3.0

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|
	\| 0.0779 \| 0.5 \| 100 \| 0.0685 \|
	\| 0.0647 \| 1.0 \| 200 \| 0.0511 \|
	\| 0.0292 \| 1.5 \| 300 \| 0.0500 \|
	\| 0.028 \| 2.0 \| 400 \| 0.0449 \|
	\| 0.013 \| 2.5 \| 500 \| 0.0488 \|
	\| 0.0116 \| 3.0 \| 600 \| 0.0481 \|


	### Framework versions

	- PEFT 0.14.0
	- Transformers 4.51.3
	- Pytorch 2.6.0+cu124
	- Datasets 3.5.0
	- Tokenizers 0.21.1