Alawy21
/

Invoice_Extraction_Qwen2_2B_Finetuning

Image-Text-to-Text

Generated from Trainer

Model card Files Files and versions

Alawy21 commited on May 11, 2025

Commit

cd6854f

·

verified ·

1 Parent(s): 165a034

Update README.md

Files changed (1) hide show

README.md +55 -17

README.md CHANGED Viewed

@@ -1,15 +1,30 @@
----
-library_name: peft
-license: apache-2.0
-base_model: Qwen/Qwen2-VL-2B-Instruct
-tags:
-- llama-factory
-- lora
-- generated_from_trainer
-model-index:
-- name: models
-  results: []
----
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
@@ -22,17 +37,40 @@ It achieves the following results on the evaluation set:
 ## Model description
-More information needed
-## Intended uses & limitations
-More information needed
 ## Training and evaluation data
-More information needed
-## Training procedure
 ### Training hyperparameters

+---
+library_name: peft
+license: apache-2.0
+base_model: Qwen/Qwen2-VL-2B-Instruct
+tags:
+- llama-factory
+- lora
+- generated_from_trainer
+- Qwen
+- Vl-model
+- fine-tuning
+- vision-model
+- multi-modal
+model-index:
+- name: models
+  results: []
+datasets:
+- naver-clova-ix/cord-v2
+language:
+- en
+metrics:
+- accuracy
+- precision
+- recall
+- f1
+pipeline_tag: visual-question-answering
+---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
 ## Model description
+- he Qwen2 2B model has been fine-tuned on OCR-rich invoice data from the CORD-v2 dataset, allowing it to recognize both the content and layout of invoices effectively. The model outputs structured information directly, enabling downstream processing or integration into accounting systems.
+For each invoice image, the model identifies and extracts the following fields:
+- Menu Items
+- Item Prices
+- Subtotal Price
+- Total Price
+- Tax Amount
+- Cash Given
+- Change Amount
+## More Info
+- Base Model: Qwen2 2B — a large language model fine-tuned for vision-language tasks.
+- Fine-Tuning: Supervised learning on OCR + structure pairs from the CORD-v2 dataset.
+- Input: OCR-annotated invoice image data from the CORD-v2 dataset.
+- Output: Structured extraction of key financial fields in JSON format.
 ## Training and evaluation data
+- Training Set: 800 samples Used to fine-tune the Qwen2 2B model on learning to extract key invoice components from OCR-text and layout information.
+- Evaluation Set: 100 samples Used to assess the model’s ability to generalize and accurately extract fields such as menu items, prices, subtotal, tax, cash, and change from unseen invoices.
 ### Training hyperparameters