Congratulations on the release great work!!
I have a doubt and would love some clarification.
Why isn’t top-K reranking sufficient for token cost reduction in production RAG systems, and in which scenarios does semantic highlighting provide the biggest advantage over rerankers?
Additionally, I wanted to ask:
Can the semantic highlight model further break down or split sentences into smaller, more fine-grained relevant spans (instead of selecting full sentences), or is sentence-level pruning the intended granularity?