CS626: Speech, NLP and the Web (Course Project)
Multimodal OCR model for complex document understanding.
Nanonets / olmOCR / RolmOCR / Aya-Vision / Qwen2-VL-OCR