Added Contamination Evidence on MMLU of ChatGPT/GPT4 from "Investigating data contamination in modern benchmarks for large language models" (#10)
Browse files- Update contamination_report.csv (383926d5dc6ddec32216652947950efd9f1b0df8)
- Update contamination_report.csv (5ecf89dce1728b3e6c0329a44752442431a30ba3)
- Update contamination_report.csv (4666215baa7d58f706e11b0940259d73bba8ae2c)
- Update contamination_report.csv (77f5a8da1d7951a49843f9dae6dbfe0905db469c)
- Update contamination_report.csv (eb87bf5b3f5b7f5ba69862fbf90fccec27c00e2f)
- Merge branch 'main' of https://huggingface.co/spaces/CONDA-Workshop/Data-Contamination-Report into pr/10 (18d798a286845b7f8a62d786d3240c65af0ffe07)
Co-authored-by: Ameya Prabhu <AmeyaPrabhu@users.noreply.huggingface.co>
- contamination_report.csv +24 -0
contamination_report.csv
CHANGED
|
@@ -477,6 +477,30 @@ EdinburghNLP/xsum;;GPT-3.5;model;0.0;;100.0;model-based;https://arxiv.org/abs/23
|
|
| 477 |
bigbio/mednli;;GPT-4;model;0.0;0.0;0.0;model-based;https://arxiv.org/abs/2308.08493;8
|
| 478 |
bigbio/mednli;;GPT-3.5;model;0.0;0.0;0.0;model-based;https://arxiv.org/abs/2308.08493;8
|
| 479 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 480 |
RadNLI;;GPT-4;model;0.0;0.0;0.0;model-based;https://arxiv.org/abs/2308.08493;8
|
| 481 |
RadNLI;;GPT-3.5;model;0.0;0.0;0.0;model-based;https://arxiv.org/abs/2308.08493;8
|
| 482 |
RadNLI;;GPT-4;model;0.0;0.0;0.0;model-based;https://arxiv.org/abs/2308.08493;8
|
|
|
|
| 477 |
bigbio/mednli;;GPT-4;model;0.0;0.0;0.0;model-based;https://arxiv.org/abs/2308.08493;8
|
| 478 |
bigbio/mednli;;GPT-3.5;model;0.0;0.0;0.0;model-based;https://arxiv.org/abs/2308.08493;8
|
| 479 |
|
| 480 |
+
RadNLI;;GPT-4;model;0.0;0.0;0.0;model-based;https://arxiv.org/pdf/2308.08493;8
|
| 481 |
+
RadNLI;;GPT-3.5;model;0.0;0.0;0.0;model-based;https://arxiv.org/pdf/2308.08493;8
|
| 482 |
+
|
| 483 |
+
|
| 484 |
+
cais/mmlu;;GPT-3.5;model;;;52.0;model-based;https://arxiv.org/abs/2311.09783;10
|
| 485 |
+
winogrande;;GPT-3.5;model;;;9.0;model-based;https://arxiv.org/abs/2311.09783;10
|
| 486 |
+
truthful_qa;;GPT-3.5;model;;;12.0;model-based;https://arxiv.org/abs/2311.09783;10
|
| 487 |
+
allenai/openbookqa;;GPT-3.5;model;;;1.0;model-based;https://arxiv.org/abs/2311.09783;10
|
| 488 |
+
|
| 489 |
+
cais/mmlu;;GPT-4;model;;;57.0;model-based;https://arxiv.org/abs/2311.09783;10
|
| 490 |
+
truthful_qa;;GPT-4;model;;;10.0;model-based;https://arxiv.org/abs/2311.09783;10
|
| 491 |
+
winogrande;;GPT-4;model;;;12.0;model-based;https://arxiv.org/abs/2311.09783;10
|
| 492 |
+
allenai/openbookqa;;GPT-4;model;;;1.0;model-based;https://arxiv.org/abs/2311.09783;10
|
| 493 |
+
Rowan/hellaswag;;GPT-4;model;;;2.0;model-based;https://arxiv.org/abs/2311.09783;10
|
| 494 |
+
|
| 495 |
+
|
| 496 |
+
allenai/openbookqa;;LLaMa 2-13B;model;;;4.0;model-based;https://arxiv.org/abs/2311.09783;10
|
| 497 |
+
truthful_qa;;LLaMa 2-13B;model;;;2.0;model-based;https://arxiv.org/abs/2311.09783;10
|
| 498 |
+
winogrande;;LLaMa 2-13B;model;;;1.0;model-based;https://arxiv.org/abs/2311.09783;10
|
| 499 |
+
|
| 500 |
+
truthful_qa;;Mistral-7B;model;;;15.0;model-based;https://arxiv.org/abs/2311.09783;10
|
| 501 |
+
allenai/openbookqa;;Mistral-7B;model;;;10.0;model-based;https://arxiv.org/abs/2311.09783;10
|
| 502 |
+
winogrande;;Mistral-7B;model;;;3.0;model-based;https://arxiv.org/abs/2311.09783;10
|
| 503 |
+
cais/mmlu;;Mistral-7B;model;;;1.0;model-based;https://arxiv.org/abs/2311.09783;10
|
| 504 |
RadNLI;;GPT-4;model;0.0;0.0;0.0;model-based;https://arxiv.org/abs/2308.08493;8
|
| 505 |
RadNLI;;GPT-3.5;model;0.0;0.0;0.0;model-based;https://arxiv.org/abs/2308.08493;8
|
| 506 |
RadNLI;;GPT-4;model;0.0;0.0;0.0;model-based;https://arxiv.org/abs/2308.08493;8
|