NuExtract-2.0 Collection Models specialized in extracting structured information (JSON) from text, PDFs, scans, spreadsheets, etc. • 15 items • Updated 8 days ago • 27
view article Article Post-OCR-Correction: 1 billion words dataset of automated OCR correction by LLM Apr 26, 2024 • 17
OpenCulture Collection A multilingual dataset of public domain books and newspapers. • 27 items • Updated Nov 6, 2024 • 132