hastyle commited on
Commit
6304594
·
verified ·
1 Parent(s): 819ea21

Upload folder using huggingface_hub

Browse files
Files changed (3) hide show
  1. README.md +158 -0
  2. adapter_config.json +46 -0
  3. adapter_model.safetensors +3 -0
README.md ADDED
@@ -0,0 +1,158 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: allenai/olmOCR-2-7B-1025
3
+ library_name: peft
4
+ pipeline_tag: image-text-to-text
5
+ license: apache-2.0
6
+ language:
7
+ - ar
8
+ tags:
9
+ - lora
10
+ - ocr
11
+ - arabic
12
+ - handwriting
13
+ - transformers
14
+ - qwen2-vl
15
+ ---
16
+
17
+ # olmOCR Arabic LoRA Adapter
18
+
19
+ A LoRA (Low-Rank Adaptation) fine-tuned adapter for Arabic OCR, built on top of [allenai/olmOCR-2-7B-1025](https://huggingface.co/allenai/olmOCR-2-7B-1025).
20
+
21
+ ## Model Description
22
+
23
+ This adapter enhances olmOCR's ability to recognize Arabic text in documents, including:
24
+ - Handwritten Arabic text
25
+ - Printed Arabic documents
26
+ - Mixed Arabic/English documents
27
+
28
+ ### Training Details
29
+
30
+ | Parameter | Value |
31
+ |-----------|-------|
32
+ | Base Model | allenai/olmOCR-2-7B-1025 |
33
+ | LoRA Rank (r) | 16 |
34
+ | LoRA Alpha | 32 |
35
+ | LoRA Dropout | 0.05 |
36
+ | Training Samples | 450,044 |
37
+ | Epochs | 3 |
38
+ | Learning Rate | 2e-5 |
39
+ | Batch Size | 64 (effective) |
40
+ | Hardware | 8x NVIDIA A100 80GB |
41
+ | Training Time | ~36 hours |
42
+ | Trainable Parameters | 47.6M (0.57% of total) |
43
+
44
+ ### Target Modules
45
+ - `q_proj`, `k_proj`, `v_proj`, `o_proj` (attention)
46
+ - `gate_proj`, `up_proj`, `down_proj` (FFN)
47
+
48
+ ## Usage
49
+
50
+ ### Installation
51
+
52
+ ```bash
53
+ pip install transformers peft torch
54
+ ```
55
+
56
+ ### Load the Model
57
+
58
+ ```python
59
+ from transformers import AutoProcessor, Qwen2_5_VLForConditionalGeneration
60
+ from peft import PeftModel
61
+ import torch
62
+
63
+ # Load base model
64
+ base_model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
65
+ "allenai/olmOCR-2-7B-1025",
66
+ torch_dtype=torch.bfloat16,
67
+ device_map="auto",
68
+ trust_remote_code=True,
69
+ )
70
+
71
+ # Load LoRA adapter
72
+ model = PeftModel.from_pretrained(base_model, "allenai/olmOCR-arabic-lora")
73
+
74
+ # Optional: Merge for faster inference
75
+ model = model.merge_and_unload()
76
+
77
+ # Load processor
78
+ processor = AutoProcessor.from_pretrained("allenai/olmOCR-2-7B-1025", trust_remote_code=True)
79
+ ```
80
+
81
+ ### Run Inference
82
+
83
+ ```python
84
+ from PIL import Image
85
+
86
+ # Load your Arabic document image
87
+ image = Image.open("arabic_document.png")
88
+
89
+ # Create prompt (olmOCR format)
90
+ messages = [
91
+ {
92
+ "role": "user",
93
+ "content": [
94
+ {"type": "image", "image": image},
95
+ {"type": "text", "text": "Extract the text from this document."},
96
+ ],
97
+ }
98
+ ]
99
+
100
+ # Process and generate
101
+ text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
102
+ inputs = processor(text=[text], images=[image], return_tensors="pt", padding=True)
103
+ inputs = {k: v.to(model.device) for k, v in inputs.items()}
104
+
105
+ with torch.no_grad():
106
+ outputs = model.generate(**inputs, max_new_tokens=2048, do_sample=False)
107
+
108
+ # Decode output
109
+ result = processor.batch_decode(outputs[:, inputs["input_ids"].shape[1]:], skip_special_tokens=True)[0]
110
+ print(result)
111
+ ```
112
+
113
+ ## Training Data
114
+
115
+ The model was fine-tuned on a combined dataset of Arabic OCR samples including:
116
+ - Arabic handwritten documents
117
+ - Printed Arabic text
118
+ - Mixed-script documents
119
+
120
+ Total training samples: 450,044
121
+
122
+ ## Evaluation
123
+
124
+ Evaluation results will be added after benchmark completion.
125
+
126
+ Target metrics:
127
+ - Word Error Rate (WER): < 10%
128
+ - Character Error Rate (CER): < 5%
129
+
130
+ ## Limitations
131
+
132
+ - Optimized primarily for Arabic script
133
+ - Performance may vary on extremely degraded or low-quality scans
134
+ - Works best with documents at 150+ DPI
135
+
136
+ ## Citation
137
+
138
+ If you use this model, please cite:
139
+
140
+ ```bibtex
141
+ @misc{olmocr-arabic-lora,
142
+ title={olmOCR Arabic LoRA Adapter},
143
+ author={Allen Institute for AI},
144
+ year={2025},
145
+ publisher={Hugging Face},
146
+ url={https://huggingface.co/allenai/olmOCR-arabic-lora}
147
+ }
148
+ ```
149
+
150
+ ## License
151
+
152
+ Apache 2.0
153
+
154
+ ### Framework Versions
155
+
156
+ - PEFT: 0.18.0
157
+ - Transformers: 4.47+
158
+ - PyTorch: 2.0+
adapter_config.json ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alora_invocation_tokens": null,
3
+ "alpha_pattern": {},
4
+ "arrow_config": null,
5
+ "auto_mapping": null,
6
+ "base_model_name_or_path": "allenai/olmOCR-2-7B-1025",
7
+ "bias": "none",
8
+ "corda_config": null,
9
+ "ensure_weight_tying": false,
10
+ "eva_config": null,
11
+ "exclude_modules": null,
12
+ "fan_in_fan_out": false,
13
+ "inference_mode": true,
14
+ "init_lora_weights": true,
15
+ "layer_replication": null,
16
+ "layers_pattern": null,
17
+ "layers_to_transform": null,
18
+ "loftq_config": {},
19
+ "lora_alpha": 32,
20
+ "lora_bias": false,
21
+ "lora_dropout": 0.05,
22
+ "megatron_config": null,
23
+ "megatron_core": "megatron.core",
24
+ "modules_to_save": null,
25
+ "peft_type": "LORA",
26
+ "peft_version": "0.18.0",
27
+ "qalora_group_size": 16,
28
+ "r": 16,
29
+ "rank_pattern": {},
30
+ "revision": null,
31
+ "target_modules": [
32
+ "o_proj",
33
+ "gate_proj",
34
+ "v_proj",
35
+ "k_proj",
36
+ "up_proj",
37
+ "q_proj",
38
+ "down_proj"
39
+ ],
40
+ "target_parameters": null,
41
+ "task_type": "CAUSAL_LM",
42
+ "trainable_token_indices": null,
43
+ "use_dora": false,
44
+ "use_qalora": false,
45
+ "use_rslora": false
46
+ }
adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b4c33569c7072adebb9484b5a23636a9538d91d00d8729c5bc11f1ebe9b6f9a0
3
+ size 190442760