mathtok / evaluation /datasets /sample_problems.json
SurweeshSP's picture
Initial clean MathTok release
edede4c
{"expressions": [
"x^2 + 2*x + 1",
"sin(x)^2 + cos(x)^2",
"x^3 - 3*x^2 + 3*x - 1",
"e^(i*pi) + 1",
"log(x*y)",
"sqrt(x^2 + y^2)",
"1/(1 + e^(-x))",
"x^2 - y^2",
"a^2 + 2*a*b + b^2",
"(x+1)*(x-1)",
"diff(sin(x), x)",
"integrate(x^2, x)",
"limit(sin(x)/x, x, 0)",
"sum(k^2, k, 1, n)",
"factorial(n) / (factorial(k)*factorial(n-k))",
"exp(-x^2/2) / sqrt(2*pi)",
"a*x^2 + b*x + c",
"(-b + sqrt(b^2 - 4*a*c)) / (2*a)",
"log(1 + x)",
"x - x^3/6 + x^5/120",
"1 + 1/2 + 1/4 + 1/8",
"n*(n+1)/2",
"2^10",
"abs(x - y)",
"floor(x) + ceil(-x)",
"gamma(n+1)",
"sinh(x) + cosh(x)",
"atan(y/x)",
"x^2 + y^2 + z^2",
"det([[a,b],[c,d]])"
],
"equivalent_pairs": [
["x^2 + 2*x + 1", "(x+1)^2"],
["a^2 - b^2", "(a+b)*(a-b)"],
["a^2 + 2*a*b + b^2", "(a+b)^2"],
["x^3 - y^3", "(x-y)*(x^2 + x*y + y^2)"],
["sin(x)^2 + cos(x)^2","1"],
["log(x) + log(y)", "log(x*y)"],
["e^x * e^y", "e^(x+y)"],
["1/x + 1/y", "(x+y)/(x*y)"],
["b + a", "a + b"],
["2*x + 2*y", "2*(x+y)"],
["x/2", "x * (1/2)"],
["x^2 * x^3", "x^5"],
["(x^2)^3", "x^6"],
["log(e^x)", "x"],
["e^(log(x))", "x"],
["n*(n+1)/2", "n/2 + n^2/2"],
["1 + x + x^2", "(x^3 - 1)/(x-1)"],
["cos(2*x)", "1 - 2*sin(x)^2"],
["tan(x)", "sin(x)/cos(x)"],
["cosh(x)^2 - sinh(x)^2","1"]
],
"rewriting_groups": [
["x^2 + 2*x + 1", "(x+1)^2", "x*(x+2) + 1"],
["a*b + a*c", "a*(b+c)", "a*c + a*b"],
["sin(x)/cos(x)", "tan(x)", "sin(x)*sec(x)"],
["e^(x+y)", "e^x * e^y"],
["log(x^2)", "2*log(x)","log(x) + log(x)"],
["n*(n+1)/2", "n/2*(n+1)", "sum(k, k, 1, n)"]
],
"mixed_text_math": [
"The derivative of $\\sin(x^2)$ with respect to $x$ is $2x\\cos(x^2)$.",
"Let $f(x) = x^2 + 2x + 1$. Then $f(x) = (x+1)^2$.",
"The quadratic formula gives $x = \\frac{-b \\pm \\sqrt{b^2 - 4ac}}{2a}$.",
"Euler's identity states that $e^{i\\pi} + 1 = 0$.",
"The integral $\\int_0^1 x^2 dx = \\frac{1}{3}$.",
"For any $n \\geq 1$, the sum $\\sum_{k=1}^{n} k = \\frac{n(n+1)}{2}$.",
"The Pythagorean theorem: $a^2 + b^2 = c^2$ for right triangles.",
"The normal distribution is $f(x) = \\frac{1}{\\sqrt{2\\pi}}e^{-x^2/2}$.",
"If $\\sin^2(x) + \\cos^2(x) = 1$ then $\\tan^2(x) + 1 = \\sec^2(x)$.",
"The limit $\\lim_{x \\to 0} \\frac{\\sin(x)}{x} = 1$ is fundamental.",
"Find the derivative of f(x) = sin(x^2) + 3x.",
"Solve for x: x^2 - 5*x + 6 = 0.",
"The area of a circle of radius r is pi*r^2.",
"Simplify: (a+b)^2 - (a-b)^2.",
"Compute the Taylor series of exp(x) around x=0."
],
"latex_only": [
"\\frac{x^2 - 1}{x + 1}",
"\\sqrt{\\frac{a^2 + b^2}{2}}",
"\\int_0^\\infty e^{-x^2} dx",
"\\sum_{n=0}^{\\infty} \\frac{x^n}{n!}",
"\\lim_{n \\to \\infty} \\left(1 + \\frac{1}{n}\\right)^n",
"\\binom{n}{k} = \\frac{n!}{k!(n-k)!}",
"\\frac{d}{dx}\\left[\\ln(x)\\right] = \\frac{1}{x}",
"\\nabla^2 f = \\frac{\\partial^2 f}{\\partial x^2} + \\frac{\\partial^2 f}{\\partial y^2}"
],
"ascii_only": [
"x**2 + 2*x + 1",
"sin(x)**2 + cos(x)**2",
"exp(-x**2 / 2) / sqrt(2*pi)",
"factorial(n) / (factorial(k) * factorial(n - k))",
"log(x**2) - 2*log(x)",
"abs(a - b) + abs(b - c)",
"floor(x/2) * 2",
"gamma(n + 1) / gamma(n)"
],
"metadata": {
"version": "1.0",
"description": "MathTok benchmark dataset — curated expressions for evaluating structural tokenization quality",
"sources": ["handcrafted", "DeepMind-Mathematics-inspired"],
"num_expressions": 30,
"num_equivalent_pairs": 20,
"num_rewriting_groups": 6,
"num_mixed": 15
}
}