sapkotapraful
/

FullyOCR

vision-language

text-extraction

Model card Files Files and versions

sapkotapraful commited on Nov 3, 2025

Commit

cbbafd2

·

verified ·

1 Parent(s): 81c72b6

Update README.md

Files changed (1) hide show

README.md +70 -11

README.md CHANGED Viewed

@@ -1,22 +1,81 @@
 ---
-base_model: unsloth/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit
 tags:
-- text-generation-inference
-- transformers
 - unsloth
-- qwen3_vl
-- trl
 license: apache-2.0
 language:
 - en
 ---
-# Uploaded  model
-- **Developed by:** sapkotapraful
-- **License:** apache-2.0
-- **Finetuned from model :** unsloth/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit
-This qwen3_vl model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
-[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)

 ---
+base_model: sapkotapraful/FullyOCR
 tags:
+- vision-language
 - unsloth
+- fullyocr
+- text-extraction
+- transformers
 license: apache-2.0
 language:
 - en
 ---
+# Model Card — sapkotapraful/FullyOCR (finetuned)
+- Developed by: sapkotapraful
+- License: apache-2.0
+- Model: sapkotapraful/FullyOCR
+- Framework: Unsloth (FastVisionModel) + PyTorch
+Short description
+- FullyOCR is a vision-language OCR model finetuned for extracting text and structured content from images and PDFs. It is intended for research, prototyping, and non-critical document extraction tasks.
+Intended use
+- OCR/text extraction from images and scanned documents.
+- Not for automated medical, legal, or safety-critical decisions without human review.
+How to load (using Unsloth; no external API calls)
+- Minimal local loading and inference example. Adjust device/quantization flags as needed.
+````python
+from unsloth import FastVisionModel
+import torch
+from PIL import Image
+# Load model + tokenizer (example uses 4-bit quantization if applicable)
+model, tokenizer = FastVisionModel.from_pretrained(
+    "sapkotapraful/FullyOCR",
+    load_in_4bit=True,
+)
+model.eval()
+device = "cuda" if torch.cuda.is_available() else "cpu"
+if device == "cuda":
+    model = model.to(device)
+# Instruction token used during finetuning
+instruction = "<|MD|>"
+# Prepare messages in training-time template
+messages = [
+    {"role": "user", "content": [
+        {"type": "image"},
+        {"type": "text", "text": instruction}
+    ]}
+]
+input_text = tokenizer.apply_chat_template(messages, add_generation_prompt=True)
+# [image](http://_vscodecontentref_/0) is a PIL.Image in RGB mode
+# tokenizer returns tensors suitable for model.generate
+inputs = tokenizer(
+    image,               # PIL.Image object
+    input_text,
+    add_special_tokens=False,
+    return_tensors="pt",
+).to(device)
+with torch.no_grad(), torch.amp.autocast(device_type="cuda", enabled=(device=="cuda")):
+    output_ids = model.generate(
+        **inputs,
+        max_new_tokens=1024,
+        use_cache=True,
+        num_beams=1,
+        do_sample=False,
+        pad_token_id=tokenizer.pad_token_id,
+    )
+decoded = tokenizer.batch_decode(output_ids, skip_special_tokens=True)[0]
+extracted = decoded.split(instruction)[-1].strip()
+print(extracted)