Extract thesis metadata from title page images
Search documents with hybrid BM25 and embedding ranking
Align extracted person names with IdRef identifiers
Streamlit template space
Nanonets / olmOCR / Qwen3-VL / LightOnOCR-2-1B / NuExtract