DeepMount00
/

Smol-OCR-preview

Text Generation

Model card Files Files and versions

Smol-OCR-preview / README.md

DeepMount00's picture

Upload README.md with huggingface_hub

1163a68 verified 9 months ago

|

history blame contribute delete

1.53 kB

	---
	language: en
	license: other
	datasets:
	- DeepMount00/ner_training
	tags:
	- vision
	- multimodal
	- OCR
	- SmolVLM
	pipeline_tag: text-generation
	---

	# SmolVLM Base - OCR Fine-tuned

	This is a merged version of SmolVLM-Base fine-tuned for OCR tasks. The model was trained using QLoRA on the DeepMount00/ner_training dataset.

	## Model Details

	- Base Model: HuggingFaceTB/SmolVLM-Base
	- Task: Optical Character Recognition (OCR)
	- Training Method: QLoRA with 4-bit quantization
	- Target Modules: down_proj, o_proj, k_proj, q_proj, gate_proj, up_proj, v_proj

	## Usage

	```python
	from transformers import AutoProcessor, Idefics3ForConditionalGeneration
	import torch
	from PIL import Image

	model_id = "DeepMount00/SmolVLM-Base-ocr_base"
	processor = AutoProcessor.from_pretrained(model_id)
	model = Idefics3ForConditionalGeneration.from_pretrained(model_id)

	# Load your image
	image = Image.open("path_to_your_image.jpg").convert("RGB")

	# Prepare the prompt
	messages = [
	{
	"role": "user",
	"content": [
	{"type": "text", "text": "You are a model specialized in OCR"},
	{"type": "image"},
	{"type": "text", "text": "Extract the text from this image"}
	]
	}
	]

	# Process inputs
	inputs = processor(text=messages, images=[image], return_tensors="pt")

	# Generate
	with torch.no_grad():
	outputs = model.generate(**inputs, max_new_tokens=512)

	# Decode and print the response
	print(processor.decode(outputs[0], skip_special_tokens=True))
	```