VLM + OCR - a Bk9x Collection

Bk9x 's Collections

Data_Pretrain_NLP

Automatic Speech Recognition

VLM + OCR

updated May 19

5CD-AI/Vintern-1B-v2

Image-Text-to-Text • 0.9B • Updated Jan 17, 2025 • 690 • 81
erax-ai/EraX-VL-7B-V1.0

Image-Text-to-Text • 8B • Updated Jan 15, 2025 • 47 • 44
Running on Zero

Agents

Featured

277

granite-docling-258M demo

📝

277

Convert and query documents from images with AI
datalab-to/chandra

Image-Text-to-Text • 9B • Updated Mar 26 • 152k • 527
deepseek-ai/DeepSeek-OCR

Image-Text-to-Text • 3B • Updated Nov 4, 2025 • 2.28M • 3.29k
Running on Zero

MCP

69

Multimodal OCR3

🌖

69

Chandra-OCR / Nanonets-OCR2 / olmOCR-2 / Dots.OCR
lightonai/LightOnOCR-2-1B

Image-Text-to-Text • 1B • Updated 2 days ago • 195k • 706
HuggingFaceFW/finepdfs

Viewer • Updated Apr 3 • 476M • 79.2k • 882
baidu/Qianfan-OCR

Image-Text-to-Text • 5B • Updated Apr 29 • 262k • 1.19k

Note 4B direct image-to-Markdown conversion and supports a broad range of prompt-driven tasks — from structured document parsing and table extraction to chart understanding, document question answering, and key information extraction
tinixai/ocr_annual_financials

Viewer • Updated May 26 • 18.2k • 2.49k • 23

Note báo cáo tài chính 10 năm vào dataset tinixai/ocr_annual_financials trên Hugging Face. Hiện tại dataset bao gồm: • 18.231 báo cáo tài chính • 1.491 mã chứng khoán • Dữ liệu từ 2015–2025 • ~26 triệu rows Parquet • ~194GB PDF + OCR text • OCR accuracy ~95% với số liệu và bảng biểu Đây có thể xem là một trong những bộ dữ liệu nguồn mở lớn nhất Việt Nam về: ✔ Financial AI ✔ OCR tiếng Việt ✔ Document AI ✔ Financial RAG ✔ Vietnamese LLM Dataset chứa: • Báo cáo tài chính hợp nhất • Báo cáo công ty mẹ •