| # ARC | |
| ### Paper | |
| Title: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge | |
| Abstract: https://arxiv.org/abs/1803.05457 | |
| The ARC dataset consists of 7,787 science exam questions drawn from a variety | |
| of sources, including science questions provided under license by a research | |
| partner affiliated with AI2. These are text-only, English language exam questions | |
| that span several grade levels as indicated in the files. Each question has a | |
| multiple choice structure (typically 4 answer options). The questions are sorted | |
| into a Challenge Set of 2,590 “hard” questions (those that both a retrieval and | |
| a co-occurrence method fail to answer correctly) and an Easy Set of 5,197 questions. | |
| Homepage: https://allenai.org/data/arc | |
| ### Citation | |
| ``` | |
| @article{Clark2018ThinkYH, | |
| title={Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge}, | |
| author={Peter Clark and Isaac Cowhey and Oren Etzioni and Tushar Khot and Ashish Sabharwal and Carissa Schoenick and Oyvind Tafjord}, | |
| journal={ArXiv}, | |
| year={2018}, | |
| volume={abs/1803.05457} | |
| } | |
| ``` | |
| ### Groups, Tags, and Tasks | |
| #### Groups | |
| None. | |
| #### Tags | |
| * `ai2_arc`: Evaluates `arc_easy` and `arc_challenge` | |
| #### Tasks | |
| * `arc_easy` | |
| * `arc_challenge` | |
| ### Checklist | |
| For adding novel benchmarks/datasets to the library: | |
| * [ ] Is the task an existing benchmark in the literature? | |
| * [ ] Have you referenced the original paper that introduced the task? | |
| * [ ] If yes, does the original paper provide a reference implementation? If so, have you checked against the reference implementation and documented how to run such a test? | |
| If other tasks on this dataset are already supported: | |
| * [ ] Is the "Main" variant of this task clearly denoted? | |
| * [ ] Have you provided a short sentence in a README on what each new variant adds / evaluates? | |
| * [ ] Have you noted which, if any, published evaluation setups are matched by this variant? | |