QA RAG
updated
0x22almostEvil/multilingual-wikihow-qa-16k
Viewer
• Updated • 16.8k • 594
• 11
0xSero/nemotron-super-reap-artifacts-draft
2796gauravc/agentic-search-chromadb
Viewer
• Updated • 1 • 42
2796gauravc/agentic-search-data
Viewer
• Updated • 225 • 164
AlekseyKorshuk/quora-question-pairs
Viewer
• Updated • 404k • 283
• 10
Aratako/Magpie-Tanuki-Qwen2.5-72B-Answered
Viewer
• Updated • 28.5k • 21
• 1
Arun63/knowledge-graph-triplets-sharegpt
Viewer
• Updated • 38k • 8
Arun63/rag_query_expansion
Viewer
• Updated • 12k • 9
Arun63/rag_query_rephrasing
Viewer
• Updated • 12k • 5
Arun63/sharegpt-fda-cfr-11-qa
Viewer
• Updated • 36 • 13
Viewer
• Updated • 9.43k • 698
• 1
BEE-spoke-data/google_wellformed_query-hf
Viewer
• Updated • 25.1k • 12
BUT-FIT/ReCzechSum-QueryBased
Viewer
• Updated • 100k • 17
ChuGyouk/KorMedConceptsQA
Viewer
• Updated • 73.2k • 40
• 6
Viewer
• Updated • 497 • 9
• 3
Viewer
• Updated • 4.45k • 64
• 5
Viewer
• Updated • 5 • 11
Viewer
• Updated • 13.8k • 3
Courage-1984/Aristotle_Roufanis
Courage-1984/DragonBall_flickr_rip
Preview
• Updated • 5
Courage-1984/MakotoShinkai_frames
Viewer
• Updated • 1.7k • 3
Courage-1984/anime_dump_reddit
Viewer
• Updated • 34k • 1
Viewer
• Updated • 5.17k • 8
Viewer
• Updated • 4k • 134
• 1
Delta-Vector/Hydrus-Minecraft-QA
Viewer
• Updated • 94.9k • 4
Delta-Vector/Hydrus-Science-QA-sharegpt
Preview
• Updated • 3
• 2
Eurolingua/HPLT3_DE_0.9_Quantile_DiverseQA
Updated • 526
• 2
FreedomIntelligence/RAG-Instruct
Viewer
• Updated • 40.5k • 258
• 40
FreedomIntelligence/huatuo_consultation_qa
Viewer
• Updated • 32.7M • 187
• 16
FreedomIntelligence/huatuo_encyclopedia_qa
Viewer
• Updated • 364k • 1.42k
• 88
FreedomIntelligence/huatuo_knowledge_graph_qa
Viewer
• Updated • 798k • 663
• 52
GeorgeDaDude/Complete_question_subset_V2
Viewer
• Updated • 3.39k • 19
GeorgeDaDude/p1_qa_complete
Viewer
• Updated • 2.92k • 4
Viewer
• Updated • 6k • 51
Viewer
• Updated • 76.1k • 336
• 1
Viewer
• Updated • 2.84k • 4
Viewer
• Updated • 1.72k • 218
• 1
Viewer
• Updated • 13k • 127
Viewer
• Updated • 2.49k • 382
• 10
Viewer
• Updated • 1.84k • 99
Viewer
• Updated • 10k • 92
Viewer
• Updated • 14.6k • 49
Viewer
• Updated • 5k • 23
Viewer
• Updated • 4.12k • 248
• 2
Viewer
• Updated • 4.12k • 58
HuggingFaceH4/Llama-3.2-1B-Instruct-beam-search-completions
Viewer
• Updated • 13k • 124
• 1
HuggingFaceH4/Llama-3.2-3B-Instruct-beam-search-completions
Viewer
• Updated • 11k • 402
• 1
Viewer
• Updated • 14.2k • 52
• 1
Malikeh1375/nemotron_finesearch_10K
Viewer
• Updated • 10k • 10
Necent/efficientrag-filter-training-data
Viewer
• Updated • 14.8k • 33
Necent/efficientrag-labeler-training-data
Viewer
• Updated • 83.2k • 54
NousResearch/AcademicMCQA
Viewer
• Updated • 99.8k • 50
• 2
NousResearch/Hermes-3-Dataset
Viewer
• Updated • 959k • 1.31k
• 313
NousResearch/RLVR_Coding_Problems
Preview
• Updated • 142
• 85
NousResearch/RL_Agentica_STDIN
Viewer
• Updated • 20k • 18
• 2
NousResearch/RefusalDataset
Viewer
• Updated • 166 • 188
• 13
NousResearch/XLAM-Atropos
Viewer
• Updated • 60k • 65
• 8
NousResearch/company-fundamentals-prediction-lite
Viewer
• Updated • 24.4k • 31
• 3
NousResearch/eval-DeepSeek-R1-0528
Viewer
• Updated • 103k • 40
• 1
NousResearch/eval-DeepSeek-V3-0324
Viewer
• Updated • 103k • 843
• 1
NousResearch/eval-Hermes-4.3-36B
Viewer
• Updated • 103k • 75
• 3
NousResearch/eval-Hermes-4.3-36B-centralized
Viewer
• Updated • 103k • 191
• 2
NousResearch/func-calling-eval
Viewer
• Updated • 100 • 40
• 16
NousResearch/func-calling-eval-glaive
Viewer
• Updated • 100 • 15
• 9
NousResearch/func-calling-eval-singleturn
Viewer
• Updated • 112 • 11
• 8
NousResearch/hermes-function-calling-v1
Viewer
• Updated • 11.6k • 26.4k
• 423
NousResearch/huskybench-hands
Viewer
• Updated • 6.33M • 24
• 1
NousResearch/json-mode-eval
Viewer
• Updated • 100 • 1.44k
• 44
Viewer
• Updated • 454 • 39
• 1
NousResearch/openthoughts-tblite
Viewer
• Updated • 100 • 213
• 10
NousResearch/terminal-bench-2
Viewer
• Updated • 89 • 899
• 3
Viewer
• Updated • 2.11k • 932
• 4
OALL/details_Applied-Innovation-Center__Karnak_v2_alrage
Viewer
• Updated • 4.21k • 5
OALL/details_Lina-Z__qwen_arabic_ft_v2_alrage
Viewer
• Updated • 4.21k • 9
OALL/details_Mushari440__Qwen3-8B-SFT-V2_v2_alrage
Viewer
• Updated • 4.21k • 37
OALL/details_Ocelotr__Qwen3-8B-GAE_v2_alrage
Viewer
• Updated • 4.21k • 28
OALL/details_Qwen__Qwen2.5-14B_v2_alrage
Viewer
• Updated • 4.21k • 6
OALL/details_Qwen__Qwen2.5-7B_v2_alrage
Viewer
• Updated • 4.21k • 3
OALL/details_Qwen__Qwen3-14B_v2_alrage
Viewer
• Updated • 8.43k • 10
OALL/details_Qwen__Qwen3-30B-A3B-Instruct-2507_v2_alrage
Viewer
• Updated • 4.21k • 9
OALL/details_deep-analysis-research__D2IL-Arabic-Qwen2.5-72B-Instruct-v0.1_v2
Viewer
• Updated • 183k • 7.66k
OALL/details_deep-analysis-research__D2IL-Arabic-Qwen2.5-72B-Instruct-v0.1_v2_alrage
Viewer
• Updated • 4.21k • 333
OALL/details_deep-analysis-research__D2IL-Arabic-Qwen2.5-72B-Instruct-v0.2_v2
Viewer
• Updated • 274k • 12.1k
OALL/details_hammh0a__Hala-9B_v2_alrage
Viewer
• Updated • 4.21k • 9
Viewer
• Updated • 6.77k • 53
Viewer
• Updated • 2.85k • 23
Viewer
• Updated • 187k • 94
• 1
Viewer
• Updated • 2k • 50
• 1
Viewer
• Updated • 179k • 374
Viewer
• Updated • 101k • 11
Viewer
• Updated • 3.52k • 17
OdiaGenAI/RAG_Evaluation_Dataset
Viewer
• Updated • 1.39k • 74
OusiaResearch/Aureth-Agent-SFT-Robust
Viewer
• Updated • 243k • 27
OusiaResearch/Aureth-DPO-Curriculum
Viewer
• Updated • 3.77k • 20
OusiaResearch/Aureth-SFT-Curriculum
Viewer
• Updated • 236k • 38
OusiaResearch/Aureth-V3-Training-Data
PJMixers/naklecha_minecraft-question-answer-700k-ShareGPT
Viewer
• Updated • 694k • 11
• 3
Updated • 37
• 1
Updated • 29
Updated • 66
Updated • 52
Updated • 38
• 1
Updated • 69
Updated • 51
• 2
Salesforce/FaithEval-unanswerable-v1.0
Viewer
• Updated • 2.49k • 446
• 4
Salesforce/LiveResearchBench
Viewer
• Updated • 623 • 1.74k
• 6
Salesforce/LiveResearchBenchFull
Viewer
• Updated • 772 • 107
• 4
Viewer
• Updated • 1.28k • 16
• 2
Viewer
• Updated • 101 • 5.83k
• 20
SeppeV/OnlyRAG_for_survey
Viewer
• Updated • 420 • 1
SeppeV/RAG_test_rec_on_topic_w_userbased_filtering
Viewer
• Updated • 100 • 5
SeppeV/example_sem_search
Viewer
• Updated • 50 • 11
SeppeV/joke_gen_mistral_bm_for_prompt_8_only_topic_for_RAG
Viewer
• Updated • 20 • 3
SeppeV/joke_gen_mistral_bm_for_prompt_8_only_topic_for_RAG_jo
Viewer
• Updated • 20 • 13
SeppeV/results_RAG_test_random_rec
Viewer
• Updated • 125 • 2
SeppeV/results_RAG_test_rec_on_topic
Viewer
• Updated • 125 • 3
SeppeV/results_RAG_test_rec_on_topic_w_userbased_filtering
Viewer
• Updated • 100 • 3
SeppeV/results_joke_gen_mistral_bm_for_prompt_8_only_topic_for_RAG_jo
Viewer
• Updated • 20 • 2
SocialGrep/one-million-reddit-questions
Viewer
• Updated • 1M • 208
• 12
SocialGrep/ten-million-reddit-answers
Updated • 163
• 10
11-47/high_priest_supernatural_magic_FACT_BASED_1M
Viewer
• Updated • 1M • 121
• 1
YUXCulturalAILab/senegal-sante-maternelle-qa
Viewer
• Updated • 1k • 26
• 2
adamo1139/basic_economics_questions_ts_test_1
Viewer
• Updated • 2.11k • 10
• 1
adamo1139/basic_economics_questions_ts_test_2
Viewer
• Updated • 3.02k • 13
adamo1139/basic_economics_questions_ts_test_3
Viewer
• Updated • 3.02k • 6
adamo1139/basic_economics_questions_ts_test_4
Viewer
• Updated • 8k • 17
agentlans/NousResearch-Hermes-3-Dataset-multiturn
Viewer
• Updated • 13.3k • 176
• 2
ai2lumos/lumos_complex_qa_ground_iterative
Viewer
• Updated • 19.1k • 69
• 3
ai2lumos/lumos_complex_qa_ground_onetime
Viewer
• Updated • 19.2k • 60
• 4
ai2lumos/lumos_complex_qa_plan_iterative
Viewer
• Updated • 19k • 105
• 7
ai2lumos/lumos_complex_qa_plan_onetime
Viewer
• Updated • 19.4k • 74
• 3
ajibawa-2023/Education-Researchers
Viewer
• Updated • 255k • 6
• 8
alexandreteles/AlpacaToxicQA_ShareGPT
Viewer
• Updated • 6.87k • 31
• 8
Viewer
• Updated • 9.98k • 6.73k
• 23
Viewer
• Updated • 1.59k • 5.26k
• 101
Updated • 2.7k
• 27
Updated • 23.8k
• 30
argilla/cloud_assistant_questions
Viewer
• Updated • 262 • 16
argilla/research_titles_multi-label
Viewer
• Updated • 21k • 25
Viewer
• Updated • 98.2k • 8
Viewer
• Updated • 92.7k • 16
autoevaluate/autoeval-staging-eval-project-adversarial_qa-1cd241d3-12195624
Viewer
• Updated • 3k • 6
autoevaluate/autoeval-staging-eval-project-adversarial_qa-58460439-11825575
Viewer
• Updated • 3k • 12
autoevaluate/autoeval-staging-eval-project-adversarial_qa-e34332b7-12205625
Viewer
• Updated • 3k • 6
autoevaluate/autoeval-staging-eval-project-adversarial_qa-e34332b7-12205627
Viewer
• Updated • 3k • 11
autoevaluate/autoeval-staging-eval-project-adversarial_qa-e34332b7-12205628
Viewer
• Updated • 3k • 6
beyoru/CheapResearch_Cleaned
Viewer
• Updated • 32.8k • 2
beyoru/ToolCalling_Search
Viewer
• Updated • 33.1k • 3
beyoru/tin_hoc_ai_judgement_no_rag
Viewer
• Updated • 100 • 3
breadlicker45/autotrain-data-yahoo-answer-small
Preview
• Updated • 4
• 1
Viewer
• Updated • 175k • 13
• 3
breadlicker45/bread-qa-updated
Viewer
• Updated • 175k • 7
breadlicker45/yahoo-answers-3k-lines
Viewer
• Updated • 3k • 7
• 1
breadlicker45/yahoo_answers
Viewer
• Updated • 65.5k • 5
breadlicker45/yahoo_answers_v2
Viewer
• Updated • 1.43M • 7
• 1
Viewer
• Updated • 121k • 18
chimbiwide/sciqa-thinking
Viewer
• Updated • 3k • 27
communityai/system_identity
Viewer
• Updated • 868 • 2
communityai/us-inc-identity-arc-1.0-ultrasafeai-en-SFT
Viewer
• Updated • 1.21k • 5
Viewer
• Updated • 579k • 31
• 3
davanstrien/query-to-dataset-viewer-descriptions
Viewer
• Updated • 11k • 20
• 5
Preview
• Updated • 20
davidquicast/constitucion-politica-del-peru-1993-qa
Viewer
• Updated • 2.08k • 61
• 1
davidquicast/constitucion-politica-del-peru-1993-qa-gemma-2b-it-format
Viewer
• Updated • 2.08k • 11
davidquicast/constitucion-politica-del-peru-1993-qa-gemma-2b-it-format-80train-20test
Viewer
• Updated • 2.08k • 14
davidquicast/constitucion_politica_del_peru_1993_qa_argilla
Viewer
• Updated • 2.07k • 10
davidquicast/constitucion_politica_del_peru_1993_qa_raw
Viewer
• Updated • 2.08k • 5
davidquicast/info-security-policies-rag-distiset
Viewer
• Updated • 100 • 26
davidquicast/info-security-policies-rag-distiset-argilla
Viewer
• Updated • 98 • 4
davidquicast/information-security-policies-qa-distiset
Viewer
• Updated • 198 • 14
davidquicast/information-security-policies-qa-distiset-argilla
Viewer
• Updated • 97 • 10
Viewer
• Updated • 21.2k • 18.6k
• 234
derek-thomas/squad-v1.1-t5-question-generation
Viewer
• Updated • 21k • 76
• 6
dinushiTJ/nz-hansard-triplets
Viewer
• Updated • 3.27k • 6
Viewer
• Updated • 20.8k • 385
• 3
dmayhem93/agieval-logiqa-en
Viewer
• Updated • 651 • 322
dmayhem93/agieval-logiqa-zh
Viewer
• Updated • 651 • 45
dmayhem93/self-critiquing-critique-answer-ranking
Viewer
• Updated • 11k • 10
dmayhem93/self-critiquing-critique-answer-ranking-test
Viewer
• Updated • 2.35k • 3
dmayhem93/self-critiquing-critique-answer-ranking-train
Viewer
• Updated • 11.4k • 7
emozilla/qasper-pruned-llama-gptneox-4k
Viewer
• Updated • 529 • 7
emozilla/qasper-pruned-llama-gptneox-8k
Viewer
• Updated • 1.39k • 56
Viewer
• Updated • 23.4k • 51
freQuensy23/parus_questions
Viewer
• Updated • 400 • 7
• 1
freQuensy23/toxic-answers
Viewer
• Updated • 37 • 7
• 1
french-open-data/lieux-de-covoiturage-organisation-microstop
french-open-data/piaf-le-dataset-francophone-de-questions-reponses
google/FACTS-grounding-public
Viewer
• Updated • 868 • 1.44k
• 46
google/IndicGenBench_xorqa_in
Updated • 501
• 5
Viewer
• Updated • 900 • 16.3k
• 123
google/granola-entity-questions
Viewer
• Updated • 12.5k • 160
• 12
Viewer
• Updated • 1k • 2.01k
• 49
Viewer
• Updated • 666 • 367
• 46
Viewer
• Updated • 546 • 319
hamishivi/GPQA-train-RLVR
Viewer
• Updated • 348 • 468
Viewer
• Updated • 4.33k • 27
hamishivi/SimpleQA-RLVR-noprompt
Viewer
• Updated • 4.33k • 18
Viewer
• Updated • 5.3k • 3
hamishivi/asqa_rlvr_no_prompt
Viewer
• Updated • 5.3k • 3
Viewer
• Updated • 4.41k • 24
hamishivi/rds-sels-squad-top326k
Viewer
• Updated • 326k • 8
hamishivi/sft_ablations_redsearcher_sft_sanitized
Viewer
• Updated • 9.81k • 19
Viewer
• Updated • 4.33k • 18
hamishivi/simple_qa_rlvr_no_prompt
Viewer
• Updated • 4.33k • 17
hamishivi/simpleqa_10_actions_llama3.3_70b_it
Viewer
• Updated • 1.03k • 14
hamishivi/simpleqa_5_actions_llama3.3_70b_it
Viewer
• Updated • 4.33k • 10
hamishivi/tqa_rlvr_no_prompt
Viewer
• Updated • 156k • 5
harpreetsahota/LI_Learning_RAG_Eval_Set
Preview
• Updated • 2
• 2
harpreetsahota/ragas-example-dataset
Viewer
• Updated • 25 • 11
• 1
iamketan25/gsm-general-qa-instructions
Viewer
• Updated • 25.7k • 75
• 4
Viewer
• Updated • 5.05k • 88
ibm-research/900K-Judgements
Viewer
• Updated • 939k • 88
• 3
ibm-research/AITQARetrieval
Viewer
• Updated • 3.99k • 107
ibm-research/AssetOpsBench
Viewer
• Updated • 467 • 904
• 42
Viewer
• Updated • 1.4k • 549
• 23
Viewer
• Updated • 1.4k • 31
• 2
ibm-research/Auto-BenchmarkCard
Viewer
• Updated • 105 • 170
• 3
ibm-research/BFCL-FC-robustness
Updated • 13
Viewer
• Updated • 3.08k • 49
• 3
Preview
• Updated • 147
• 1
ibm-research/BoolQ_robustness
Viewer
• Updated • 29.4k • 24
ibm-research/Climate-Change-NER
Viewer
• Updated • 46.2k • 59
• 11
ibm-research/FailureSensorIQ
Viewer
• Updated • 8.3k • 344
• 7
ibm-research/FeTaQARetrieval
Viewer
• Updated • 24.7k • 31
ibm-research/ITBench-Lite
Updated • 9.19k
• 12
ibm-research/ITBench-Trajectories
Viewer
• Updated • 53.5k • 99
• 6
ibm-research/LLM_Fine-Tuning_Performance
Preview
• Updated • 106
• 2
ibm-research/MedMentions-ZS
Viewer
• Updated • 29.1k • 618
• 2
ibm-research/MermaidSeqBench
Viewer
• Updated • 132 • 168
• 5
ibm-research/MultiHierttRetrieval
Viewer
• Updated • 11.9k • 36
• 1
ibm-research/NQTablesRetrieval
Viewer
• Updated • 533k • 120
• 2
ibm-research/OTTQASmallRetrieval
Viewer
• Updated • 31.1k • 56
• 1
ibm-research/OpenWikiTablesRetrieval
Viewer
• Updated • 178k • 62
• 1
Viewer
• Updated • 415 • 19
ibm-research/PopQA_robustness
Viewer
• Updated • 204k • 27
Viewer
• Updated • 2.71k • 12
• 5
ibm-research/REAL-MM-RAG_FinReport
Viewer
• Updated • 2.93k • 914
• 8
ibm-research/REAL-MM-RAG_FinReport_BEIR
Viewer
• Updated • 5.27k • 49
• 2
ibm-research/REAL-MM-RAG_FinSlides
Viewer
• Updated • 2.59k • 860
• 2
ibm-research/REAL-MM-RAG_FinSlides_BEIR
Viewer
• Updated • 5.5k • 40
• 1
ibm-research/REAL-MM-RAG_FinTabTrainSet
Viewer
• Updated • 48.2k • 11
• 2
ibm-research/REAL-MM-RAG_FinTabTrainSet_rephrased
Viewer
• Updated • 48.2k • 48
• 2
ibm-research/REAL-MM-RAG_TechReport
Viewer
• Updated • 2.2k • 828
• 3
ibm-research/REAL-MM-RAG_TechReport_BEIR
Viewer
• Updated • 5.57k • 43
• 1
ibm-research/REAL-MM-RAG_TechSlides
Viewer
• Updated • 2.62k • 849
• 2
ibm-research/REAL-MM-RAG_TechSlides_BEIR
Viewer
• Updated • 6.09k • 135
• 1
ibm-research/SQL-API-Bench
Viewer
• Updated • 3.41k • 1.71k
• 5
Preview
• Updated • 21
• 1
Updated • 242
• 16
ibm-research/SocialStigmaQA
Viewer
• Updated • 20.7k • 90
• 7
ibm-research/SocialStigmaQA-JA
Viewer
• Updated • 10.4k • 30
• 4
ibm-research/Split-IFEval
Viewer
• Updated • 541 • 23
• 1
ibm-research/ToolRM-train-data
Viewer
• Updated • 459k • 69
• 7
Viewer
• Updated • 1.33k • 1.09k
• 45
Viewer
• Updated • 1.78k • 591
• 4
ibm-research/WatsonxDocsQARetrieval
Viewer
• Updated • 1.2k • 36
ibm-research/WikiVQABench
Viewer
• Updated • 344 • 123
• 6
ibm-research/Wish-IE-Falcon
Viewer
• Updated • 1k • 5
ibm-research/Wish-QA-ASQA-Falcon
Viewer
• Updated • 4.35k • 21
ibm-research/Wish-QA-ASQA-Llama
Viewer
• Updated • 3.46k • 23
• 2
ibm-research/Wish-QA-ELI5-Falcon
Viewer
• Updated • 10k • 47
• 1
ibm-research/Wish-QA-ELI5-Llama
Viewer
• Updated • 8.41k • 19
• 3
ibm-research/Wish-QA-Falcon
Viewer
• Updated • 10.8k • 25
• 1
ibm-research/Wish-QA-NQ-Falcon
Viewer
• Updated • 39.3k • 38
ibm-research/Wish-QA-NQ-Llama
Viewer
• Updated • 10k • 17
• 1
ibm-research/Wish-Summarization-Falcon
Viewer
• Updated • 10k • 5
ibm-research/Wish-Summarization-Llama
Viewer
• Updated • 10k • 2
Viewer
• Updated • 3.72k • 1.69k
• 13
ibm-research/argument_quality_ranking_30k
Viewer
• Updated • 40k • 359
• 13
Viewer
• Updated • 301k • 2.92k
• 5
ibm-research/claim_stance
Viewer
• Updated • 4.79k • 73
• 7
ibm-research/clinic150-sur
Viewer
• Updated • 600k • 102
• 2
ibm-research/data-product-benchmark
Viewer
• Updated • 33.2k • 2.5k
• 3
Viewer
• Updated • 187k • 2.3k
• 34
Updated • 2.42k
• 14
Viewer
• Updated • 546 • 12
• 2
ibm-research/hemolab-bench
Viewer
• Updated • 49.7k • 514
• 2
ibm-research/identity_group_abuse_robustness
Viewer
• Updated • 21.8k • 37
• 2
ibm-research/justrank_judge_scores
Viewer
• Updated • 1.51M • 7
• 3
ibm-research/knowledge_consistency_of_LLMs
Preview
• Updated • 47
• 3
Viewer
• Updated • 720 • 166
• 1
Viewer
• Updated • 1.86k • 352
• 19
Updated • 43
• 4
ibm-research/otter_primekg
Updated • 95
• 4
ibm-research/otter_stitch
Updated • 78
• 2
ibm-research/otter_uniprot_bindingdb
Updated • 16
• 3
ibm-research/otter_uniprot_bindingdb_chembl
Updated • 30
• 4
ibm-research/patchtsmixer-etth1-test-data
Updated • 178
ibm-research/patchtst-etth1-test-data
Updated • 152
Viewer
• Updated • 119k • 59
• 2
ibm-research/rag-hpo-bench
Preview
• Updated • 7.63k
• 2
Updated • 339
• 3
ibm-research/trajcast.datasets-arxiv2025
ibm-research/turl_table_col_type
Updated • 31
ibm-research/vira-dialog-acts-live
Viewer
• Updated • 714 • 40
• 1
ibm-research/vira-intents
Viewer
• Updated • 7.97k • 15
• 2
ibm-research/vira-intents-live
Viewer
• Updated • 13.7k • 214
• 1
ibm-research/watsonxDocsQA
Viewer
• Updated • 1.22k • 79
• 5
inclusionAI/ASearcher-Local-Knowledge
Viewer
• Updated • 45.2M • 14.5k
• 8
inclusionAI/ASearcher-test-data
Updated • 301
• 4
inclusionAI/ASearcher-train-data
Preview
• Updated • 256
• 27
Viewer
• Updated • 111k • 1.86k
• 16
innodatalabs/rt4-science-QA
Viewer
• Updated • 75.3k • 831
• 3
Updated • 11
• 1
irds/lotte_lifestyle_dev_search
irds/lotte_lifestyle_test_search
irds/lotte_pooled_test_search
irds/lotte_recreation_dev_search
irds/lotte_recreation_test_search
irds/lotte_science_dev_search
irds/lotte_science_test_search
irds/lotte_technology_test_search
Updated • 12
• 1
jack4444b/ALP_Behavioral_ECON_QA
Viewer
• Updated • 85 • 4
Viewer
• Updated • 6.42k • 10
jayavibhav/synthbio-qa-ambig
Viewer
• Updated • 6.6k • 12
jayavibhav/synthbio-qa-ambig-8
Viewer
• Updated • 8k • 18
Viewer
• Updated • 1.44k • 64
• 23
jjmachan/NSFW-questions-inter-cleaned_df
Viewer
• Updated • 12.9k • 561
• 6
jtatman/databricks-dolly-8k-qa-open-close
Viewer
• Updated • 7.71k • 60
jtatman/hypnosis_dataset_questions
Viewer
• Updated • 1.35k • 10
• 5
jtatman/orca_mini_uncensored_squad_format_train
Viewer
• Updated • 74.8k • 27
• 1
jtatman/orca_minis_uncensored_squad_format
Viewer
• Updated • 104k • 32
• 1
julep-ai-archive/samantha-self_aware_answerable
Viewer
• Updated • 3.37k • 36
• 1
Viewer
• Updated • 400 • 8
justinphan3110/sharegpt_instructions_small_en_vi_answers
Viewer
• Updated • 424 • 9
Viewer
• Updated • 860k • 248
• 54
Preview
• Updated • 33
• 1
lamini/product-catalog-questions
Viewer
• Updated • 27.4k • 49
• 7
lianghsun/QA_TaiwanEdoctor
Viewer
• Updated • 178k • 565
Viewer
• Updated • 40.5k • 37
• 2
Viewer
• Updated • 579 • 36
• 1
Viewer
• Updated • 5.39k • 32
• 1
lianghsun/tw-legal-synthetic-qa
Viewer
• Updated • 9.63k • 273
• 9
lianghsun/vulnerability-mitigation-qa-zh_tw
Viewer
• Updated • 22 • 74
• 3
lightonai/dbpedia-entity-decontaminated
Viewer
• Updated • 1.69M • 95
lightonai/fiqa-decontaminated
Viewer
• Updated • 49.8k • 64
lightonai/hotpotqa-decontaminated
Viewer
• Updated • 2.33M • 39
lightonai/hotpotqa_contrastive
Viewer
• Updated • 85k • 22
lightonai/scifact-decontaminated
Viewer
• Updated • 1.31k • 72
lightonai/trivia_contrastive
Viewer
• Updated • 60.4k • 85
lionelchg/dolly_closed_qa
Viewer
• Updated • 1.77k • 26
• 3
Viewer
• Updated • 3.74k • 25
Viewer
• Updated • 500 • 21
Viewer
• Updated • 11 • 7
lvogel123/factscore-claude-4.5-sonnet
Viewer
• Updated • 152 • 5
lvogel123/factscore-deepseek-v3.2-exp
Viewer
• Updated • 152 • 8
lvogel123/factscore-gemini-2.5-pro
Viewer
• Updated • 152 • 6
lvogel123/factscore-glm-4.6
Viewer
• Updated • 152 • 5
lvogel123/factscore-gpt-5-high
Viewer
• Updated • 152 • 5
lvogel123/factscore-gpt-oss-120b-high
Viewer
• Updated • 152 • 5
lvogel123/factscore-grok-4
Viewer
• Updated • 152 • 4
lvogel123/factscore-kimi-k2
Viewer
• Updated • 152 • 6
lvogel123/factscore-llama-3.3-nemotron-super-49b-v1.5
Viewer
• Updated • 152 • 6
lvogel123/factscore-llama-4-maverick
Viewer
• Updated • 152 • 8
lvogel123/factscore-qwen3-235b-a22b-thinking-2507
Viewer
• Updated • 152 • 5
lvogel123/gpqa-diamond-all
Viewer
• Updated • 10 • 13
lvogel123/gpqa-diamond-claude-4.5-sonnet
Viewer
• Updated • 241 • 38
• 1
lvogel123/gpqa-diamond-deepseek-v3.2-exp-high
Viewer
• Updated • 200 • 23
lvogel123/gpqa-diamond-gemini-2.5-pro
Viewer
• Updated • 744 • 290
lvogel123/gpqa-diamond-glm-4.6
Viewer
• Updated • 200 • 26
lvogel123/gpqa-diamond-glm-4.6-2
Viewer
• Updated • 199 • 18
lvogel123/gpqa-diamond-gpt-5-high
Viewer
• Updated • 201 • 19
lvogel123/gpqa-diamond-gpt-oss-120b-high
Viewer
• Updated • 200 • 25
lvogel123/gpqa-diamond-grok-4
Viewer
• Updated • 1.13k • 26
lvogel123/gpqa-diamond-kimi-k2
Viewer
• Updated • 200 • 20
lvogel123/gpqa-diamond-llama-3.3-nemotron-49b-v1.5
Viewer
• Updated • 200 • 49
lvogel123/gpqa-diamond-llama-4-maverick
Viewer
• Updated • 200 • 20
lvogel123/gpqa-diamond-qwen3-235b-a22b-2507
Viewer
• Updated • 561 • 25
lvogel123/grok-4-factscore
Viewer
• Updated • 3 • 4
Viewer
• Updated • 502 • 42
• 6
Viewer
• Updated • 2.03k • 1.04k
• 4
Viewer
• Updated • 26.5k • 8.62k
• 90
Viewer
• Updated • 911k • 963
Viewer
• Updated • 174k • 12
Viewer
• Updated • 66 • 8
markush1/adversarial-banking-questions
Viewer
• Updated • 2.25k • 4
• 3
markush1/adversarial-insurance-questions
Viewer
• Updated • 4.43k • 4
marmarg2/toxic-teenage-relationships
Updated • 5
• 2
mateowilliam/nemotron-super-reap-artifacts-draft
Preview
• Updated • 16
meandyou200175/data-query-sql
Viewer
• Updated • 11.7k • 8
Viewer
• Updated • 12k • 6
mehuldamani/gpt5-simpleqa-20
Viewer
• Updated • 20 • 8
• 1
mehuldamani/half_hotpot_qa
Viewer
• Updated • 10.3k • 11
Viewer
• Updated • 20.5k • 563
mehuldamani/hotpot_qa_for_multi
Viewer
• Updated • 20.5k • 9
mehuldamani/hotpot_qa_multi_models_pass_k_evals_onHotpot_nov11
Viewer
• Updated • 500 • 12
mehuldamani/hotpot_qa_single_models_pass_k_evals_onHotpot_nov11
Viewer
• Updated • 500 • 10
mehuldamani/hotpot_qa_test_gold_removed_1
Viewer
• Updated • 20.5k • 17
mehuldamani/hotpot_qa_test_gold_removed_2
Viewer
• Updated • 20.5k • 7
mehuldamani/hotpot_qa_trainTest_gold_removed_2
Viewer
• Updated • 20.5k • 16
mehuldamani/multi-answer-sft-target-dataset
Viewer
• Updated • 1.59k • 4
mehuldamani/qwen3_8b_ambigQA_rlcr_multi_analysis
Viewer
• Updated • 2k • 7
mehuldamani/qwen3_8b_ambigQA_rlcr_single_passk_tryAgain
Viewer
• Updated • 2k • 7
meoconxinhxan/Inspect-Search-Models-Benchmarking-Result-CIR-FOR-CHECK
Viewer
• Updated • 4.33k • 17
meoconxinhxan/Inspect-Search-Models-Benchmarking-Result-CIR-FOR-CHECK-Frame
Viewer
• Updated • 824 • 19
Viewer
• Updated • 2.45k • 22
meoconxinhxan/_nano_ckeck_simpleqa
Viewer
• Updated • 4.33k • 6
meoconxinhxan/ii_med_ckeck_simpleqa
Viewer
• Updated • 4.33k • 5
meoconxinhxan/jan_nano_ckeck_simpleqa
Viewer
• Updated • 4.33k • 3
meoconxinhxan/qwen3-4b_simple_qa
Viewer
• Updated • 4.33k • 5
meoconxinhxan/r1-tool-open-ended-qa
Viewer
• Updated • 332 • 4
meoconxinhxan/search_qa_rl
Viewer
• Updated • 170k • 120
meoconxinhxan/search_r1_ds
Viewer
• Updated • 80k • 7
meoconxinhxan/search_r1_musque
Viewer
• Updated • 19.9k • 4
meoconxinhxan/simple_qa_stratified_kfold
Viewer
• Updated • 866 • 8
microsoft/MeetingBank-QA-Summary
Viewer
• Updated • 862 • 140
• 15
microsoft/bing_coronavirus_query_set
Viewer
• Updated • 318k • 696
• 1
microsoft/hnm-search-data
Viewer
• Updated • 33.3M • 23.8k
• 2
Updated • 229
• 4
Viewer
• Updated • 29.3k • 4.85k
• 74
mlfoundations-dev/GPQADiamond_evalchemy
Viewer
• Updated • 1.19k • 400
mlfoundations-dev/GPQADiamond_evalchemy_gpt-4o-mini
Viewer
• Updated • 594 • 285
mlfoundations-dev/PDF_and_SCP_unfiltered_organic_chemistry_questions
Viewer
• Updated • 43.8k • 21
mlfoundations-dev/pdf_science_questions_verifiable_r1_traces__2_24_25
Viewer
• Updated • 1.62k • 25
mlfoundations-dev/r1_annotated_finqa
Viewer
• Updated • 5k • 128
• 1
mlfoundations-dev/sci_question_exp__scp_116k__training_2k_for_GPQA
Viewer
• Updated • 118k • 966
• 1
mlfoundations-dev/sci_question_exp__scp_116k__training_2k_for_GPQA_eval_03-11-25_17-16-22_f912
Viewer
• Updated • 594 • 640
• 1
multi-train/WikiAnswers_1107
Viewer
• Updated • 200k • 6
multi-train/amazon-qa_1107
Viewer
• Updated • 200k • 6
multi-train/eli5_question_answer_1107
Viewer
• Updated • 200k • 35
• 3
multi-train/emb-hotpotqa-train
Viewer
• Updated • 68.7k • 8
multi-train/emb-medmcqa-train
Viewer
• Updated • 161k • 4
multi-train/emb-triviaqa-train
Viewer
• Updated • 52.9k • 44
multi-train/hotpotqa-train-multikilt_1107
Viewer
• Updated • 68.7k • 5
Viewer
• Updated • 161k • 5
multi-train/searchQA_top5_snippets_1107
Viewer
• Updated • 117k • 16
multi-train/squad_pairs_1107
Viewer
• Updated • 87.6k • 3
multi-train/triviaqa-train-multikilt_1107
Viewer
• Updated • 52.9k • 4
multi-train/yahoo_answers_title_answer_1107
Viewer
• Updated • 200k • 4
open-llm-leaderboard-old/details_GeorgiaTechResearchInstitute__galpaca-30b
Updated • 44
open-llm-leaderboard-old/details_garage-bAInd__Camel-Platypus2-70B
Updated • 25
open-llm-leaderboard/NousResearch__Hermes-3-Llama-3.1-70B-details
Viewer
• Updated • 43.2k • 108
open-llm-leaderboard/NousResearch__Nous-Hermes-2-Mixtral-8x7B-DPO-details
Viewer
• Updated • 45.6k • 126
open-llm-leaderboard/NousResearch__Nous-Hermes-2-Mixtral-8x7B-SFT-details
Viewer
• Updated • 80.9k • 139
opensporks/hotpotQA_filtered
Preview
• Updated • 2
Viewer
• Updated • 3.57k • 6
projectlosangeles/Orpheus-MIDI-Search
Updated • 43
• 1
Viewer
• Updated • 330 • 21
• 3
Viewer
• Updated • 500 • 445
• 12
Viewer
• Updated • 100 • 4
sam2ai/hindi_truthfulqa_gen_mini
Viewer
• Updated • 50 • 5
Viewer
• Updated • 28k • 33
Viewer
• Updated • 268k • 20
Viewer
• Updated • 12 • 45
sambanovasystems/attackqa
Updated • 108
• 6
semran1/wiki_synth_qa_redstone_dclmknw_fwedu_pdf_mix
Viewer
• Updated • 25.4M • 416
sert121/adult_data_instruction_leaving_relationship_marital-status_capital-gain_occupation_education-num
Viewer
• Updated • 15.7k • 3
sert121/adult_dataset_age_workclass_education_marital-status_occupation_relationship
Viewer
• Updated • 48.8k • 3
sert121/adult_dataset_age_workclass_education_marital-status_occupation_relationship_race
Viewer
• Updated • 48.8k • 3
sert121/adult_dataset_age_workclass_education_marital-status_occupation_relationship_race_sex
Viewer
• Updated • 48.8k • 3
• 1
sert121/adult_dataset_age_workclass_education_marital__status_occupation_relationship
Viewer
• Updated • 48.8k • 4
sert121/adult_dataset_age_workclass_education_marital__status_occupation_relationship_race
Viewer
• Updated • 48.8k • 3
sert121/adult_dataset_age_workclass_education_marital__status_occupation_relationship_race_sex
Viewer
• Updated • 48.8k • 3
• 1
tencent/ArtifactsBenchmark
Viewer
• Updated • 1.83k • 96
• 13
thangvip/OpenOrca-translate-openQA
Viewer
• Updated • 26.7k • 3
thangvip/combined-vietnamese-legal-qa-pretrain
Viewer
• Updated • 904k • 32
• 2
thangvip/combined-vietnamese-legal-qa-pretrain-tokenized-8k
Viewer
• Updated • 60.8k • 97
Viewer
• Updated • 19.6k • 5
thangvip/law-reading-comprehension-qa
Viewer
• Updated • 895k • 232
thangvip/law-reading-comprehension-qa-filtered
Viewer
• Updated • 205k • 25
Viewer
• Updated • 11.1k • 4
thangvip/question-queries-finetune
Viewer
• Updated • 17.9k • 3
thangvip/thuvienphapluat-qa-normalize
Viewer
• Updated • 16k • 6
thangvip/thuvienphapluat-question-query
Viewer
• Updated • 19.9k • 4
thangvip/thuvienphapluat-question-query-1
Viewer
• Updated • 641 • 3
thangvip/vietnamese-legal-qa
Viewer
• Updated • 9.72k • 355
• 3
Viewer
• Updated • 1.02k • 3
Viewer
• Updated • 1.43k • 5
Viewer
• Updated • 1k • 6
• 1
Viewer
• Updated • 136k • 206
• 3
theblackcat102/alexa-qa-with-rank
Viewer
• Updated • 70.5k • 71
• 2
theblackcat102/amazon_item_synthetic_retrieval
Viewer
• Updated • 23.6k • 4
theblackcat102/amazon_item_synthetic_retrieval_final
Viewer
• Updated • 28k • 4
theblackcat102/barexam_qa
Viewer
• Updated • 80 • 9
theblackcat102/gqa-testdev-balanced
Viewer
• Updated • 12.6k • 31
theblackcat102/law_freeform_qa
Viewer
• Updated • 7.3k • 6
theblackcat102/prime_factorization
Viewer
• Updated • 1.33k • 8
Viewer
• Updated • 103 • 24
Updated • 4.53k
• 1
Preview
• Updated • 8
Viewer
• Updated • 98.3k • 111
• 7
Preview
• Updated • 28
Preview
• Updated • 848
• 13
Preview
• Updated • 5
• 1
Viewer
• Updated • 1.8k • 4
waifu-research-department/regularization
Viewer
• Updated • 6.72k • 35
• 13
walledai/ForbiddenQuestions
Viewer
• Updated • 390 • 166
• 5
Viewer
• Updated • 17.8k • 2.06k
• 29
wandb/finqa-data-processed
Viewer
• Updated • 8.28k • 532
• 2
wandb/finqa-data-processed-hallucination
Viewer
• Updated • 16.6k • 262
wandb/ragbench-test-sample
Viewer
• Updated • 957 • 5
Viewer
• Updated • 90.1k • 377
• 8