sapkotapraful commited on
Commit
cbbafd2
·
verified ·
1 Parent(s): 81c72b6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +70 -11
README.md CHANGED
@@ -1,22 +1,81 @@
1
  ---
2
- base_model: unsloth/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit
3
  tags:
4
- - text-generation-inference
5
- - transformers
6
  - unsloth
7
- - qwen3_vl
8
- - trl
 
9
  license: apache-2.0
10
  language:
11
  - en
12
  ---
13
 
14
- # Uploaded model
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
 
16
- - **Developed by:** sapkotapraful
17
- - **License:** apache-2.0
18
- - **Finetuned from model :** unsloth/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit
 
 
 
 
 
19
 
20
- This qwen3_vl model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
 
 
 
 
 
 
 
 
21
 
22
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
 
 
1
  ---
2
+ base_model: sapkotapraful/FullyOCR
3
  tags:
4
+ - vision-language
 
5
  - unsloth
6
+ - fullyocr
7
+ - text-extraction
8
+ - transformers
9
  license: apache-2.0
10
  language:
11
  - en
12
  ---
13
 
14
+ # Model Card — sapkotapraful/FullyOCR (finetuned)
15
+
16
+ - Developed by: sapkotapraful
17
+ - License: apache-2.0
18
+ - Model: sapkotapraful/FullyOCR
19
+ - Framework: Unsloth (FastVisionModel) + PyTorch
20
+
21
+ Short description
22
+ - FullyOCR is a vision-language OCR model finetuned for extracting text and structured content from images and PDFs. It is intended for research, prototyping, and non-critical document extraction tasks.
23
+
24
+ Intended use
25
+ - OCR/text extraction from images and scanned documents.
26
+ - Not for automated medical, legal, or safety-critical decisions without human review.
27
+
28
+ How to load (using Unsloth; no external API calls)
29
+ - Minimal local loading and inference example. Adjust device/quantization flags as needed.
30
+
31
+ ````python
32
+ from unsloth import FastVisionModel
33
+ import torch
34
+ from PIL import Image
35
+
36
+ # Load model + tokenizer (example uses 4-bit quantization if applicable)
37
+ model, tokenizer = FastVisionModel.from_pretrained(
38
+ "sapkotapraful/FullyOCR",
39
+ load_in_4bit=True,
40
+ )
41
+
42
+ model.eval()
43
+ device = "cuda" if torch.cuda.is_available() else "cpu"
44
+ if device == "cuda":
45
+ model = model.to(device)
46
+
47
+ # Instruction token used during finetuning
48
+ instruction = "<|MD|>"
49
+
50
+ # Prepare messages in training-time template
51
+ messages = [
52
+ {"role": "user", "content": [
53
+ {"type": "image"},
54
+ {"type": "text", "text": instruction}
55
+ ]}
56
+ ]
57
+
58
+ input_text = tokenizer.apply_chat_template(messages, add_generation_prompt=True)
59
 
60
+ # [image](http://_vscodecontentref_/0) is a PIL.Image in RGB mode
61
+ # tokenizer returns tensors suitable for model.generate
62
+ inputs = tokenizer(
63
+ image, # PIL.Image object
64
+ input_text,
65
+ add_special_tokens=False,
66
+ return_tensors="pt",
67
+ ).to(device)
68
 
69
+ with torch.no_grad(), torch.amp.autocast(device_type="cuda", enabled=(device=="cuda")):
70
+ output_ids = model.generate(
71
+ **inputs,
72
+ max_new_tokens=1024,
73
+ use_cache=True,
74
+ num_beams=1,
75
+ do_sample=False,
76
+ pad_token_id=tokenizer.pad_token_id,
77
+ )
78
 
79
+ decoded = tokenizer.batch_decode(output_ids, skip_special_tokens=True)[0]
80
+ extracted = decoded.split(instruction)[-1].strip()
81
+ print(extracted)