Smol-OCR-preview / README.md
DeepMount00's picture
Upload README.md with huggingface_hub
1163a68 verified
---
language: en
license: other
datasets:
- DeepMount00/ner_training
tags:
- vision
- multimodal
- OCR
- SmolVLM
pipeline_tag: text-generation
---
# SmolVLM Base - OCR Fine-tuned
This is a merged version of SmolVLM-Base fine-tuned for OCR tasks. The model was trained using QLoRA on the DeepMount00/ner_training dataset.
## Model Details
- **Base Model**: HuggingFaceTB/SmolVLM-Base
- **Task**: Optical Character Recognition (OCR)
- **Training Method**: QLoRA with 4-bit quantization
- **Target Modules**: down_proj, o_proj, k_proj, q_proj, gate_proj, up_proj, v_proj
## Usage
```python
from transformers import AutoProcessor, Idefics3ForConditionalGeneration
import torch
from PIL import Image
model_id = "DeepMount00/SmolVLM-Base-ocr_base"
processor = AutoProcessor.from_pretrained(model_id)
model = Idefics3ForConditionalGeneration.from_pretrained(model_id)
# Load your image
image = Image.open("path_to_your_image.jpg").convert("RGB")
# Prepare the prompt
messages = [
{
"role": "user",
"content": [
{"type": "text", "text": "You are a model specialized in OCR"},
{"type": "image"},
{"type": "text", "text": "Extract the text from this image"}
]
}
]
# Process inputs
inputs = processor(text=messages, images=[image], return_tensors="pt")
# Generate
with torch.no_grad():
outputs = model.generate(**inputs, max_new_tokens=512)
# Decode and print the response
print(processor.decode(outputs[0], skip_special_tokens=True))
```