Alawy21 commited on
Commit
cd6854f
·
verified ·
1 Parent(s): 165a034

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +55 -17
README.md CHANGED
@@ -1,15 +1,30 @@
1
- ---
2
- library_name: peft
3
- license: apache-2.0
4
- base_model: Qwen/Qwen2-VL-2B-Instruct
5
- tags:
6
- - llama-factory
7
- - lora
8
- - generated_from_trainer
9
- model-index:
10
- - name: models
11
- results: []
12
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
 
14
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
  should probably proofread and complete it, then remove this comment. -->
@@ -22,17 +37,40 @@ It achieves the following results on the evaluation set:
22
 
23
  ## Model description
24
 
25
- More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
 
27
- ## Intended uses & limitations
28
 
29
- More information needed
30
 
31
  ## Training and evaluation data
32
 
33
- More information needed
34
 
35
- ## Training procedure
36
 
37
  ### Training hyperparameters
38
 
 
1
+ ---
2
+ library_name: peft
3
+ license: apache-2.0
4
+ base_model: Qwen/Qwen2-VL-2B-Instruct
5
+ tags:
6
+ - llama-factory
7
+ - lora
8
+ - generated_from_trainer
9
+ - Qwen
10
+ - Vl-model
11
+ - fine-tuning
12
+ - vision-model
13
+ - multi-modal
14
+ model-index:
15
+ - name: models
16
+ results: []
17
+ datasets:
18
+ - naver-clova-ix/cord-v2
19
+ language:
20
+ - en
21
+ metrics:
22
+ - accuracy
23
+ - precision
24
+ - recall
25
+ - f1
26
+ pipeline_tag: visual-question-answering
27
+ ---
28
 
29
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
30
  should probably proofread and complete it, then remove this comment. -->
 
37
 
38
  ## Model description
39
 
40
+ - he Qwen2 2B model has been fine-tuned on OCR-rich invoice data from the CORD-v2 dataset, allowing it to recognize both the content and layout of invoices effectively. The model outputs structured information directly, enabling downstream processing or integration into accounting systems.
41
+
42
+ For each invoice image, the model identifies and extracts the following fields:
43
+
44
+ - Menu Items
45
+
46
+ - Item Prices
47
+
48
+ - Subtotal Price
49
+
50
+ - Total Price
51
+
52
+ - Tax Amount
53
+
54
+ - Cash Given
55
+
56
+ - Change Amount
57
+
58
+ ## More Info
59
+ - Base Model: Qwen2 2B — a large language model fine-tuned for vision-language tasks.
60
+
61
+ - Fine-Tuning: Supervised learning on OCR + structure pairs from the CORD-v2 dataset.
62
+
63
+ - Input: OCR-annotated invoice image data from the CORD-v2 dataset.
64
+
65
+ - Output: Structured extraction of key financial fields in JSON format.
66
 
 
67
 
 
68
 
69
  ## Training and evaluation data
70
 
71
+ - Training Set: 800 samples Used to fine-tune the Qwen2 2B model on learning to extract key invoice components from OCR-text and layout information.
72
 
73
+ - Evaluation Set: 100 samples Used to assess the model’s ability to generalize and accurately extract fields such as menu items, prices, subtotal, tax, cash, and change from unseen invoices.
74
 
75
  ### Training hyperparameters
76