next-ocr / README.md
Lamapi's picture
Update README.md
c67594f verified
|
raw
history blame
5.29 kB
metadata
tags:
  - text-generation-inference
  - transformers
  - unsloth
  - qwen3_vl
  - trl
  - sft
  - chemistry
  - code
  - climate
  - art
  - biology
  - finance
  - legal
  - music
  - medical
  - agent
license: apache-2.0
language:
  - en
  - ab
  - aa
  - ae
  - af
  - ak
  - am
  - an
  - ar
  - as
  - av
  - ay
  - az
  - ba
  - be
  - bg
  - bh
  - bi
  - bm
  - bn
  - bo
  - br
  - bs
  - ca
  - ce
  - ch
  - co
  - cr
  - cs
  - cu
  - cv
  - cy
  - da
  - de
  - dv
  - dz
  - ee
  - el
  - eo
  - es
  - et
  - eu
  - fa
  - ff
  - fi
  - fj
  - fo
  - fr
  - fy
  - ga
  - gd
  - gl
  - gn
  - gv
  - ha
  - he
  - hi
  - ho
  - gu
  - hr
  - ht
  - hu
  - hz
  - hy
  - id
  - ia
  - ig
  - ie
  - ik
  - ii
  - is
  - io
  - iu
  - it
  - jv
  - ja
  - kg
  - ka
  - kj
  - ki
  - kl
  - kk
  - kn
  - km
  - kr
  - ko
  - ku
  - ks
  - kw
  - kv
  - la
  - ky
  - lg
  - lb
  - ln
  - li
  - lt
  - lo
  - lv
  - lu
  - mg
  - mi
  - mh
  - ml
  - mk
  - mr
  - mn
  - mt
  - ms
  - na
  - my
  - nd
  - nb
  - ng
  - nl
  - ne
  - 'no'
  - nn
  - nv
  - nr
  - oc
  - oj
  - om
  - ny
  - os
  - or
  - pa
  - pi
  - pl
  - ps
  - pt
  - rm
  - rn
  - qu
  - ro
  - ru
  - sn
  - rw
  - so
  - sa
  - sc
  - sd
pipeline_tag: image-text-to-text
library_name: transformers

🖼️ Next OCR 8B

Compact OCR AI — Accurate, Fast, Multilingual, Math-Optimized

License: MIT Language: Multilingual HuggingFace


📖 Overview

Next OCR 8B is an 8-billion parameter model optimized for optical character recognition (OCR) tasks with mathematical and tabular content understanding.

Supports multilingual OCR (Turkish, English, German, Spanish, French, Chinese, Japanese, Korean, Russian...) with high accuracy, including structured documents like tables, forms, and formulas.


⚡ Highlights

  • 🖼️ Accurate text extraction, including math and tables
  • 🌍 Multilingual support (30+ languages)
  • ⚡ Lightweight and efficient
  • 💬 Instruction-tuned for document understanding and analysis

📊 Benchmark & Comparison

Model OCR Accuracy (%) Multilingual Accuracy (%) Layout / Table Understanding (%)
Next OCR 8B 98.9 96.7 94.4
DeepSeek‑OCR 3B 97 (yüksek sıkıştırmada) 88–90 85–87

🚀 Installation & Usage

from transformers import AutoTokenizer, AutoModelForVision2Seq
import torch

model_id = "Lamapi/next-ocr"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForVision2Seq.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")

image_path = "document.png"
images = [image_path]

inputs = tokenizer(images, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

🧩 Key Features

Feature Description
🖼️ High-Accuracy OCR Extracts text from images, documents, and screenshots reliably.
🇹🇷 Multilingual Support Works with 30+ languages including Turkish.
⚡ Lightweight & Efficient Optimized for resource-constrained environments.
📄 Layout & Math Awareness Handles tables, forms, and mathematical formulas.
🏢 Reliable Outputs Suitable for enterprise document workflows.

📐 Model Specifications

Specification Details
Base Model Qwen 3
Parameters 8 Billion
Architecture Vision + Transformer (OCR LLM)
Modalities Image-to-text
Fine-Tuning OCR datasets with multilingual and math/tabular content
Optimizations Quantization-ready, FP16 support
Primary Focus Text extraction, document understanding, mathematical OCR

🎯 Ideal Use Cases

  • Document digitization
  • Invoice & receipt processing
  • Multilingual OCR pipelines
  • Tables, forms, and formulas extraction
  • Enterprise document management

📄 License

MIT License — free for commercial & non-commercial use.


📞 Contact & Support


Next OCR — Compact OCR + math-capable AI, blending accuracy, speed, and multilingual document intelligence.

Follow on HuggingFace