Scientific RAG Benchmark Collection-NIPS2026 Collection A collection of evaluation datasets for benchmarking RAG pipelines and large language models on domain-specific scientific question answering tasks. • 6 items • Updated 24 days ago