A collection of evaluation datasets for benchmarking RAG pipelines and large language models on domain-specific scientific question answering tasks.
author2026nips
anonymousauthor2026nips
AI & ML interests
None yet
Recent Activity
updated a dataset 20 days ago
anonymousauthor2026nips/multi_turn-NIPS2026 updated a dataset 20 days ago
anonymousauthor2026nips/multi_hop-NIPS2026 updated a dataset 20 days ago
anonymousauthor2026nips/aggregation-NIPS2026Organizations
None yet