Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -1,3 +1,14 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
# MathTrap300
|
| 2 |
|
| 3 |
A benchmark dataset of 300 insolvable, ill-posed mathematical problems designed to evaluate large language models' ability to recognize mathematical insolvability and fundamental contradictions.
|
|
@@ -23,7 +34,7 @@ from datasets import load_dataset
|
|
| 23 |
dataset = load_dataset("GYASBGFUHAADSGADF/mathtrap300")
|
| 24 |
|
| 25 |
# Access the data
|
| 26 |
-
for example in dataset['
|
| 27 |
print(f"Original: {example['original']}")
|
| 28 |
print(f"Trap: {example['trap']}")
|
| 29 |
print(f"Annotation: {example['annotation']}")
|
|
@@ -50,6 +61,15 @@ Our evaluation of recent advanced LLMs on MathTrap300 reveals:
|
|
| 50 |
- Condition Neglect: Models ignore critical mathematical constraints
|
| 51 |
- **Forced Solutions**: Even when models recognize insolvability, they still attempt to force a solution
|
| 52 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 53 |
## Citation
|
| 54 |
|
| 55 |
If you use this dataset in your research, please cite our paper:
|
|
@@ -66,4 +86,4 @@ If you use this dataset in your research, please cite our paper:
|
|
| 66 |
|
| 67 |
## License
|
| 68 |
|
| 69 |
-
This dataset is released under the MIT License.
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
tags:
|
| 4 |
+
- mathematics
|
| 5 |
+
- education
|
| 6 |
+
- reasoning
|
| 7 |
+
- trap-questions
|
| 8 |
+
- math-problems
|
| 9 |
+
library_name: datasets
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
# MathTrap300
|
| 13 |
|
| 14 |
A benchmark dataset of 300 insolvable, ill-posed mathematical problems designed to evaluate large language models' ability to recognize mathematical insolvability and fundamental contradictions.
|
|
|
|
| 34 |
dataset = load_dataset("GYASBGFUHAADSGADF/mathtrap300")
|
| 35 |
|
| 36 |
# Access the data
|
| 37 |
+
for example in dataset['train']:
|
| 38 |
print(f"Original: {example['original']}")
|
| 39 |
print(f"Trap: {example['trap']}")
|
| 40 |
print(f"Annotation: {example['annotation']}")
|
|
|
|
| 61 |
- Condition Neglect: Models ignore critical mathematical constraints
|
| 62 |
- **Forced Solutions**: Even when models recognize insolvability, they still attempt to force a solution
|
| 63 |
|
| 64 |
+
## Dataset Statistics
|
| 65 |
+
|
| 66 |
+
- **Total Problems**: 300 (currently 151 uploaded)
|
| 67 |
+
- **Difficulty Levels**: 1.0 - 5.0
|
| 68 |
+
- **Trap Types**: Contradiction, Missing Conditions, and others
|
| 69 |
+
- **Sources**: MATH dataset, Original creation
|
| 70 |
+
- **Validation**: Rigorously verified by PhD-level mathematical experts
|
| 71 |
+
- **Split**: Mix of train/test examples
|
| 72 |
+
|
| 73 |
## Citation
|
| 74 |
|
| 75 |
If you use this dataset in your research, please cite our paper:
|
|
|
|
| 86 |
|
| 87 |
## License
|
| 88 |
|
| 89 |
+
This dataset is released under the MIT License.
|