chart ahmed-masry/ChartQA Viewer • Updated Jun 22, 2024 • 32.7k • 1.64k • 31 oroikon/chart_captioning Viewer • Updated Oct 8, 2023 • 8.82k • 89 • 12 heegyu/chart2text_statista Viewer • Updated Oct 12, 2023 • 34.8k • 1.6k • 9 nourheshamshaheen/typed_final_chart_to_table Viewer • Updated Nov 12, 2023 • 2.81k • 51 • 6
Document Undestanding Models Mizukiluke/ureader-instruction-1.0 Viewer • Updated Oct 13, 2023 • 24.5k • 61 • 15
Captioning docling-project/USPTO-30K Viewer • Updated Aug 24, 2023 • 30k • 322 • 10 MMInstruction/ArxivCap Viewer • Updated Oct 3, 2024 • 573k • 44.7k • 58 mPLUG/M-Paper Preview • Updated Jan 13, 2024 • 473 • 13
DocQA jp1924/DocStruct4M Viewer • Updated Feb 5, 2025 • 3.05M • 48 • 4 howard-hou/OCR-VQA Viewer • Updated Apr 24, 2023 • 208k • 2.35k • 60 vikhyatk/docmatix-single Viewer • Updated Jul 19, 2024 • 565k • 42 • 6 MMInstruction/ArxivQA Viewer • Updated Mar 5, 2024 • 100k • 208 • 38
Page to MD A dataset of image-text pairs sourced from research papers on arXiv, where each image is derived from a PDF page and paired with its corresponding OCR v1v1d/Arxiv_MD_v2_2k Viewer • Updated Jun 24, 2024 • 3.04k • 14 v1v1d/Arxiv_MD_v2 Viewer • Updated Jun 24, 2024 • 14.2k • 64 v1v1d/Arxiv_MD_v1_1k Viewer • Updated Jun 23, 2024 • 1.14k • 16 v1v1d/Arxiv_MD_v1 Viewer • Updated Jun 18, 2024 • 9.96k • 54
OCR wendlerc/RenderedText Viewer • Updated Oct 23, 2025 • 12M • 6.13k • 58 Salesforce/blip3-ocr-200m Viewer • Updated Feb 3, 2025 • 96M • 1.18k • 44 openpecha/OCR-Google_Books Viewer • Updated Oct 20, 2025 • 751k • 294 openpecha/OCR-Norbuketaka Viewer • Updated Oct 14, 2025 • 2.24M • 27
Table Extraction docling-project/PubTables-1M_OTSL Viewer • Updated Aug 31, 2023 • 1.88M • 2.3k • 7 docling-project/PubTabNet_OTSL Viewer • Updated Aug 31, 2023 • 395k • 2.44k • 5 docling-project/FinTabNet_OTSL Viewer • Updated Aug 31, 2023 • 109k • 600 • 7
Layout Detection docling-project/DocLayNet-v1.1 Viewer • Updated Sep 1, 2023 • 63.5k • 1.93k • 27 docling-project/DocLayNet Updated Jan 25, 2023 • 660 • 140 vikp/doclaynet_processed Viewer • Updated Nov 30, 2023 • 80.9k • 623 • 6 psyche/publaynet Viewer • Updated Jul 30, 2024 • 347k • 111
VQA wyu1/Leopard-Instruct Viewer • Updated Nov 8, 2024 • 1.03M • 70.9k • 65 neulab/PangeaInstruct Updated Feb 2, 2025 • 381 • 86 MMInstruction/ArxivQA Viewer • Updated Mar 5, 2024 • 100k • 208 • 38 vidore/arxivqa_train Viewer • Updated Jun 20, 2025 • 95k • 392
Latex Extract A dataset collection of image-text pairs, where each image contains mathematical formulas, and each corresponding text provides the relevant LaTeX v1v1d/Latexify_v1_clean Viewer • Updated Jul 29, 2024 • 11k • 38 • 1 v1v1d/Latexify_v1 Viewer • Updated Jul 29, 2024 • 234k • 19 • 1 OleehyO/latex-formulas Viewer • Updated Aug 13, 2025 • 1.56M • 546 • 101 unsloth/LaTeX_OCR Viewer • Updated Nov 21, 2024 • 76.3k • 4.48k • 82
chart ahmed-masry/ChartQA Viewer • Updated Jun 22, 2024 • 32.7k • 1.64k • 31 oroikon/chart_captioning Viewer • Updated Oct 8, 2023 • 8.82k • 89 • 12 heegyu/chart2text_statista Viewer • Updated Oct 12, 2023 • 34.8k • 1.6k • 9 nourheshamshaheen/typed_final_chart_to_table Viewer • Updated Nov 12, 2023 • 2.81k • 51 • 6
OCR wendlerc/RenderedText Viewer • Updated Oct 23, 2025 • 12M • 6.13k • 58 Salesforce/blip3-ocr-200m Viewer • Updated Feb 3, 2025 • 96M • 1.18k • 44 openpecha/OCR-Google_Books Viewer • Updated Oct 20, 2025 • 751k • 294 openpecha/OCR-Norbuketaka Viewer • Updated Oct 14, 2025 • 2.24M • 27
Document Undestanding Models Mizukiluke/ureader-instruction-1.0 Viewer • Updated Oct 13, 2023 • 24.5k • 61 • 15
Table Extraction docling-project/PubTables-1M_OTSL Viewer • Updated Aug 31, 2023 • 1.88M • 2.3k • 7 docling-project/PubTabNet_OTSL Viewer • Updated Aug 31, 2023 • 395k • 2.44k • 5 docling-project/FinTabNet_OTSL Viewer • Updated Aug 31, 2023 • 109k • 600 • 7
Captioning docling-project/USPTO-30K Viewer • Updated Aug 24, 2023 • 30k • 322 • 10 MMInstruction/ArxivCap Viewer • Updated Oct 3, 2024 • 573k • 44.7k • 58 mPLUG/M-Paper Preview • Updated Jan 13, 2024 • 473 • 13
Layout Detection docling-project/DocLayNet-v1.1 Viewer • Updated Sep 1, 2023 • 63.5k • 1.93k • 27 docling-project/DocLayNet Updated Jan 25, 2023 • 660 • 140 vikp/doclaynet_processed Viewer • Updated Nov 30, 2023 • 80.9k • 623 • 6 psyche/publaynet Viewer • Updated Jul 30, 2024 • 347k • 111
DocQA jp1924/DocStruct4M Viewer • Updated Feb 5, 2025 • 3.05M • 48 • 4 howard-hou/OCR-VQA Viewer • Updated Apr 24, 2023 • 208k • 2.35k • 60 vikhyatk/docmatix-single Viewer • Updated Jul 19, 2024 • 565k • 42 • 6 MMInstruction/ArxivQA Viewer • Updated Mar 5, 2024 • 100k • 208 • 38
VQA wyu1/Leopard-Instruct Viewer • Updated Nov 8, 2024 • 1.03M • 70.9k • 65 neulab/PangeaInstruct Updated Feb 2, 2025 • 381 • 86 MMInstruction/ArxivQA Viewer • Updated Mar 5, 2024 • 100k • 208 • 38 vidore/arxivqa_train Viewer • Updated Jun 20, 2025 • 95k • 392
Page to MD A dataset of image-text pairs sourced from research papers on arXiv, where each image is derived from a PDF page and paired with its corresponding OCR v1v1d/Arxiv_MD_v2_2k Viewer • Updated Jun 24, 2024 • 3.04k • 14 v1v1d/Arxiv_MD_v2 Viewer • Updated Jun 24, 2024 • 14.2k • 64 v1v1d/Arxiv_MD_v1_1k Viewer • Updated Jun 23, 2024 • 1.14k • 16 v1v1d/Arxiv_MD_v1 Viewer • Updated Jun 18, 2024 • 9.96k • 54
Latex Extract A dataset collection of image-text pairs, where each image contains mathematical formulas, and each corresponding text provides the relevant LaTeX v1v1d/Latexify_v1_clean Viewer • Updated Jul 29, 2024 • 11k • 38 • 1 v1v1d/Latexify_v1 Viewer • Updated Jul 29, 2024 • 234k • 19 • 1 OleehyO/latex-formulas Viewer • Updated Aug 13, 2025 • 1.56M • 546 • 101 unsloth/LaTeX_OCR Viewer • Updated Nov 21, 2024 • 76.3k • 4.48k • 82