Model Card β sapkotapraful/FullyOCR (finetuned)
- Developed by: sapkotapraful
- License: apache-2.0
- Model: sapkotapraful/FullyOCR
- Framework: Unsloth (FastVisionModel) + PyTorch
Short description
- FullyOCR is a vision-language OCR model finetuned for extracting text and structured content from images and PDFs. It is intended for research, prototyping, and non-critical document extraction tasks.
Intended use
- OCR/text extraction from images and scanned documents.
- Not for automated medical, legal, or safety-critical decisions without human review.
How to load (using Unsloth; no external API calls)
- Minimal local loading and inference example. Adjust device/quantization flags as needed.
from unsloth import FastVisionModel
import torch
from PIL import Image
# Load model + tokenizer (example uses 4-bit quantization if applicable)
model, tokenizer = FastVisionModel.from_pretrained(
"sapkotapraful/FullyOCR",
load_in_4bit=True,
)
model.eval()
device = "cuda" if torch.cuda.is_available() else "cpu"
if device == "cuda":
model = model.to(device)
# Instruction token used during finetuning
instruction = "<|MD|>"
# Prepare messages in training-time template
messages = [
{"role": "user", "content": [
{"type": "image"},
{"type": "text", "text": instruction}
]}
]
input_text = tokenizer.apply_chat_template(messages, add_generation_prompt=True)
# [image](http://_vscodecontentref_/0) is a PIL.Image in RGB mode
# tokenizer returns tensors suitable for model.generate
inputs = tokenizer(
image, # PIL.Image object
input_text,
add_special_tokens=False,
return_tensors="pt",
).to(device)
with torch.no_grad(), torch.amp.autocast(device_type="cuda", enabled=(device=="cuda")):
output_ids = model.generate(
**inputs,
max_new_tokens=1024,
use_cache=True,
num_beams=1,
do_sample=False,
pad_token_id=tokenizer.pad_token_id,
)
decoded = tokenizer.batch_decode(output_ids, skip_special_tokens=True)[0]
extracted = decoded.split(instruction)[-1].strip()
print(extracted)
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support
Model tree for sapkotapraful/FullyOCR
Unable to build the model tree, the base model loops to the model itself. Learn more.