| --- |
| tags: |
| - text-generation-inference |
| - transformers |
| - unsloth |
| - qwen3_vl |
| - trl |
| - sft |
| - chemistry |
| - code |
| - climate |
| - art |
| - biology |
| - finance |
| - legal |
| - music |
| - medical |
| - agent |
| license: apache-2.0 |
| language: |
| - en |
| - ab |
| - aa |
| - ae |
| - af |
| - ak |
| - am |
| - an |
| - ar |
| - as |
| - av |
| - ay |
| - az |
| - ba |
| - be |
| - bg |
| - bh |
| - bi |
| - bm |
| - bn |
| - bo |
| - br |
| - bs |
| - ca |
| - ce |
| - ch |
| - co |
| - cr |
| - cs |
| - cu |
| - cv |
| - cy |
| - da |
| - de |
| - dv |
| - dz |
| - ee |
| - el |
| - eo |
| - es |
| - et |
| - eu |
| - fa |
| - ff |
| - fi |
| - fj |
| - fo |
| - fr |
| - fy |
| - ga |
| - gd |
| - gl |
| - gn |
| - gv |
| - ha |
| - he |
| - hi |
| - ho |
| - gu |
| - hr |
| - ht |
| - hu |
| - hz |
| - hy |
| - id |
| - ia |
| - ig |
| - ie |
| - ik |
| - ii |
| - is |
| - io |
| - iu |
| - it |
| - jv |
| - ja |
| - kg |
| - ka |
| - kj |
| - ki |
| - kl |
| - kk |
| - kn |
| - km |
| - kr |
| - ko |
| - ku |
| - ks |
| - kw |
| - kv |
| - la |
| - ky |
| - lg |
| - lb |
| - ln |
| - li |
| - lt |
| - lo |
| - lv |
| - lu |
| - mg |
| - mi |
| - mh |
| - ml |
| - mk |
| - mr |
| - mn |
| - mt |
| - ms |
| - na |
| - my |
| - nd |
| - nb |
| - ng |
| - nl |
| - ne |
| - 'no' |
| - nn |
| - nv |
| - nr |
| - oc |
| - oj |
| - om |
| - ny |
| - os |
| - or |
| - pa |
| - pi |
| - pl |
| - ps |
| - pt |
| - rm |
| - rn |
| - qu |
| - ro |
| - ru |
| - sn |
| - rw |
| - so |
| - sa |
| - sc |
| - sd |
| pipeline_tag: image-text-to-text |
| library_name: transformers |
| base_model: |
| - thelamapi/next-ocr |
| --- |
| <img src='bannerocr.png'> |
|
|
| # 🖼️ Next OCR 8B |
|
|
| ### *Compact OCR AI — Accurate, Fast, Multilingual, Math-Optimized* |
|
|
| [](https://opensource.org/licenses/MIT) |
| []() |
| [](https://huggingface.co/Lamapi/next-ocr) |
| [](https://discord.gg/XgH4EpyPD2) |
|
|
| --- |
|
|
| ## 📖 Overview |
|
|
| **Next OCR 8B** is an **8-billion parameter model** optimized for **optical character recognition (OCR) tasks** with **mathematical and tabular content understanding**. |
|
|
| Supports **multilingual OCR** (Turkish, English, German, Spanish, French, Chinese, Japanese, Korean, Russian...) with high accuracy, including structured documents like tables, forms, and formulas. |
|
|
| --- |
|
|
| ## ⚡ Highlights |
|
|
| * 🖼️ Accurate text extraction, including math and tables |
| * 🌍 Multilingual support (30+ languages) |
| * ⚡ Lightweight and efficient |
| * 💬 Instruction-tuned for document understanding and analysis |
|
|
| --- |
|
|
| ## 📊 Benchmark & Comparison |
|
|
|  |
|
|
| --- |
|
|
| | Model | OCR-Bench Accuracy (%) | Multilingual Accuracy (%) | Layout / Table Understanding (%) | |
| | ------------------------------- | ------------------------ | ------------------------- | -------------------------------- | |
| | **Next OCR** | **99.0** | **96.8** | **95.3** | |
| | PaddleOCR | 95.2 | 93.9 | 95.3 | |
| | Deepseek OCR | 90.6 | 87.4 | 86.1 | |
| | Tesseract | 92.0 | 88.4 | 72.0 | |
| | EasyOCR | 90.4 | 84.7 | 78.9 | |
| | Google Cloud Vision / DocAI | 98.7 | 95.5 | 93.6 | |
| | Amazon Textract | 94.7 | 86.2 | 86.1 | |
| | Azure Document Intelligence | 95.1 | 93.6 | 91.4 | |
|
|
| --- |
|
|
| | Model | Handwriting (%) | Scene Text (%) | Complex Tables (%) | |
| | --------------------------- | --------------- | -------------- | ------------------ | |
| | **Next OCR** | 92 | 96 | 91 | |
| | PaddleOCR | 88 | 92 | 90 | |
| | Deepseek OCR | 80 | 85 | 83 | |
| | Tesseract | 75 | 88 | 70 | |
| | EasyOCR | 78 | 86 | 75 | |
| | Google Cloud Vision / DocAI | 90 | 95 | 92 | |
| | Amazon Textract | 85 | 90 | 88 | |
| | Azure Document Intelligence | 87 | 91 | 89 | |
|
|
| --- |
|
|
| ## 🚀 Installation & Usage |
|
|
| ```python |
| from transformers import AutoTokenizer, AutoModelForVision2Seq |
| import torch |
| |
| model_id = "Lamapi/next-ocr" |
| |
| tokenizer = AutoTokenizer.from_pretrained(model_id) |
| model = AutoModelForVision2Seq.from_pretrained(model_id, torch_dtype=torch.float16) |
| |
| img = Image.open("image.jpg") |
| |
| # ATTENTION: The content list must include both an image and text. |
| messages = [ |
| {"role": "system", "content": "You are Next-OCR, an helpful AI assistant trained by Lamapi."}, |
| { |
| "role": "user", |
| "content": [ |
| {"type": "image", "image": img}, |
| {"type": "text", "text": "Read the text in this image and summarize it."} |
| ] |
| } |
| ] |
| |
| # Apply the chat template correctly |
| prompt = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) |
| inputs = processor(text=prompt, images=[img], return_tensors="pt").to(model.device) |
| |
| with torch.no_grad(): |
| generated = model.generate(**inputs, max_new_tokens=256) |
| |
| print(processor.decode(generated[0], skip_special_tokens=True)) |
| ``` |
|
|
| --- |
|
|
| ## 🧩 Key Features |
|
|
| | Feature | Description | |
| | -------------------------- | --------------------------------------------------------------- | |
| | 🖼️ High-Accuracy OCR | Extracts text from images, documents, and screenshots reliably. | |
| | 🇹🇷 Multilingual Support | Works with 30+ languages including Turkish. | |
| | ⚡ Lightweight & Efficient | Optimized for resource-constrained environments. | |
| | 📄 Layout & Math Awareness | Handles tables, forms, and mathematical formulas. | |
| | 🏢 Reliable Outputs | Suitable for enterprise document workflows. | |
|
|
| --- |
|
|
| ## 📐 Model Specifications |
|
|
| | Specification | Details | |
| | ----------------- | --------------------------------------------------------- | |
| | **Base Model** | Qwen 3 | |
| | **Parameters** | 8 Billion | |
| | **Architecture** | Vision + Transformer (OCR LLM) | |
| | **Modalities** | Image-to-text | |
| | **Fine-Tuning** | OCR datasets with multilingual and math/tabular content | |
| | **Optimizations** | Quantization-ready, FP16 support | |
| | **Primary Focus** | Text extraction, document understanding, mathematical OCR | |
|
|
| --- |
|
|
| ## 🎯 Ideal Use Cases |
|
|
| * Document digitization |
| * Invoice & receipt processing |
| * Multilingual OCR pipelines |
| * Tables, forms, and formulas extraction |
| * Enterprise document management |
|
|
| --- |
|
|
| ## 📄 License |
|
|
| MIT License — free for commercial & non-commercial use. |
|
|
| --- |
|
|
| ## 📞 Contact & Support |
|
|
| * 📧 Email: [lamapicontact@gmail.com](mailto:lamapicontact@gmail.com) |
| * 🤗 HuggingFace: [Lamapi](https://huggingface.co/Lamapi) |
|
|
| --- |
|
|
| > **Next OCR** — Compact *OCR + math-capable* AI, blending **accuracy**, **speed**, and **multilingual document intelligence**. |
|
|
| [](https://huggingface.co/Lamapi) |