devanshty commited on
Commit
95e6215
·
verified ·
1 Parent(s): b7ac4f7

Add model card

Browse files
Files changed (1) hide show
  1. README.md +84 -0
README.md ADDED
@@ -0,0 +1,84 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - peft
5
+ - lora
6
+ - qwen2
7
+ - multilingual
8
+ - ocr
9
+ - translation
10
+ - safetensors
11
+ base_model: Qwen/Qwen2-VL-7B-Instruct
12
+ ---
13
+
14
+ # Babel
15
+
16
+ ## Model Description
17
+ Babel is a Qwen2-VL LoRA adapter fine-tuned for multilingual OCR (Optical Character Recognition) and translation tasks. It can extract text from images across multiple languages and translate between them, making it ideal for document digitization, cross-language content processing, and international business automation.
18
+
19
+ ## Model Architecture
20
+ - **Base Model**: `Qwen/Qwen2-VL-7B-Instruct`
21
+ - **Fine-tuning Method**: LoRA (Low-Rank Adaptation) via PEFT
22
+ - **Checkpoint**: Final checkpoint
23
+ - **Task**: Multilingual OCR + Translation (Vision-Language)
24
+
25
+ ## Training Details
26
+ - **Framework**: HuggingFace PEFT + Transformers
27
+ - **Dataset**: Multilingual document images with text annotations and translations
28
+ - **Languages**: Multiple languages supported including English, Hindi, and more
29
+ - **Approach**: Vision-language fine-tuning with OCR and translation objectives
30
+
31
+ ## Files
32
+ | File | Description |
33
+ |------|-------------|
34
+ | `adapter_model.safetensors` | LoRA adapter weights |
35
+ | `adapter_config.json` | PEFT adapter configuration |
36
+ | `tokenizer.json` | Tokenizer vocabulary |
37
+ | `tokenizer_config.json` | Tokenizer configuration |
38
+
39
+ ## Usage
40
+
41
+ ```python
42
+ from transformers import AutoProcessor, Qwen2VLForConditionalGeneration
43
+ from peft import PeftModel
44
+ from PIL import Image
45
+ from huggingface_hub import snapshot_download
46
+
47
+ # Download adapter
48
+ adapter_dir = snapshot_download(repo_id='devanshty/Babel')
49
+
50
+ # Load base model
51
+ base_model = Qwen2VLForConditionalGeneration.from_pretrained(
52
+ "Qwen/Qwen2-VL-7B-Instruct",
53
+ torch_dtype="auto",
54
+ device_map="auto"
55
+ )
56
+ processor = AutoProcessor.from_pretrained(adapter_dir)
57
+
58
+ # Load LoRA adapter
59
+ model = PeftModel.from_pretrained(base_model, adapter_dir)
60
+ model.eval()
61
+
62
+ # OCR + Translate
63
+ image = Image.open("document.jpg")
64
+ messages = [
65
+ {
66
+ "role": "user",
67
+ "content": [
68
+ {"type": "image", "image": image},
69
+ {"type": "text", "text": "Extract all text from this image and translate it to English."}
70
+ ]
71
+ }
72
+ ]
73
+ text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
74
+ inputs = processor(text=[text], images=[image], return_tensors="pt").to(model.device)
75
+ output = model.generate(**inputs, max_new_tokens=1024)
76
+ print(processor.decode(output[0], skip_special_tokens=True))
77
+ ```
78
+
79
+ ## Download & Use
80
+
81
+ ```python
82
+ from huggingface_hub import hf_hub_download
83
+ adapter = hf_hub_download(repo_id='devanshty/Babel', filename='adapter_model.safetensors')
84
+ ```