devanshty
/

Babel

Model card Files Files and versions

Babel / README.md

devanshty's picture

Add model card

95e6215 verified 6 days ago

|

history blame contribute delete

2.64 kB

	---
	license: mit
	tags:
	- peft
	- lora
	- qwen2
	- multilingual
	- ocr
	- translation
	- safetensors
	base_model: Qwen/Qwen2-VL-7B-Instruct
	---

	# Babel

	## Model Description
	Babel is a Qwen2-VL LoRA adapter fine-tuned for multilingual OCR (Optical Character Recognition) and translation tasks. It can extract text from images across multiple languages and translate between them, making it ideal for document digitization, cross-language content processing, and international business automation.

	## Model Architecture
	- Base Model: `Qwen/Qwen2-VL-7B-Instruct`
	- Fine-tuning Method: LoRA (Low-Rank Adaptation) via PEFT
	- Checkpoint: Final checkpoint
	- Task: Multilingual OCR + Translation (Vision-Language)

	## Training Details
	- Framework: HuggingFace PEFT + Transformers
	- Dataset: Multilingual document images with text annotations and translations
	- Languages: Multiple languages supported including English, Hindi, and more
	- Approach: Vision-language fine-tuning with OCR and translation objectives

	## Files
	\| File \| Description \|
	\|------\|-------------\|
	\| `adapter_model.safetensors` \| LoRA adapter weights \|
	\| `adapter_config.json` \| PEFT adapter configuration \|
	\| `tokenizer.json` \| Tokenizer vocabulary \|
	\| `tokenizer_config.json` \| Tokenizer configuration \|

	## Usage

	```python
	from transformers import AutoProcessor, Qwen2VLForConditionalGeneration
	from peft import PeftModel
	from PIL import Image
	from huggingface_hub import snapshot_download

	# Download adapter
	adapter_dir = snapshot_download(repo_id='devanshty/Babel')

	# Load base model
	base_model = Qwen2VLForConditionalGeneration.from_pretrained(
	"Qwen/Qwen2-VL-7B-Instruct",
	torch_dtype="auto",
	device_map="auto"
	)
	processor = AutoProcessor.from_pretrained(adapter_dir)

	# Load LoRA adapter
	model = PeftModel.from_pretrained(base_model, adapter_dir)
	model.eval()

	# OCR + Translate
	image = Image.open("document.jpg")
	messages = [
	{
	"role": "user",
	"content": [
	{"type": "image", "image": image},
	{"type": "text", "text": "Extract all text from this image and translate it to English."}
	]
	}
	]
	text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
	inputs = processor(text=[text], images=[image], return_tensors="pt").to(model.device)
	output = model.generate(**inputs, max_new_tokens=1024)
	print(processor.decode(output[0], skip_special_tokens=True))
	```

	## Download & Use

	```python
	from huggingface_hub import hf_hub_download
	adapter = hf_hub_download(repo_id='devanshty/Babel', filename='adapter_model.safetensors')
	```