{"expressions": [ "x^2 + 2*x + 1", "sin(x)^2 + cos(x)^2", "x^3 - 3*x^2 + 3*x - 1", "e^(i*pi) + 1", "log(x*y)", "sqrt(x^2 + y^2)", "1/(1 + e^(-x))", "x^2 - y^2", "a^2 + 2*a*b + b^2", "(x+1)*(x-1)", "diff(sin(x), x)", "integrate(x^2, x)", "limit(sin(x)/x, x, 0)", "sum(k^2, k, 1, n)", "factorial(n) / (factorial(k)*factorial(n-k))", "exp(-x^2/2) / sqrt(2*pi)", "a*x^2 + b*x + c", "(-b + sqrt(b^2 - 4*a*c)) / (2*a)", "log(1 + x)", "x - x^3/6 + x^5/120", "1 + 1/2 + 1/4 + 1/8", "n*(n+1)/2", "2^10", "abs(x - y)", "floor(x) + ceil(-x)", "gamma(n+1)", "sinh(x) + cosh(x)", "atan(y/x)", "x^2 + y^2 + z^2", "det([[a,b],[c,d]])" ], "equivalent_pairs": [ ["x^2 + 2*x + 1", "(x+1)^2"], ["a^2 - b^2", "(a+b)*(a-b)"], ["a^2 + 2*a*b + b^2", "(a+b)^2"], ["x^3 - y^3", "(x-y)*(x^2 + x*y + y^2)"], ["sin(x)^2 + cos(x)^2","1"], ["log(x) + log(y)", "log(x*y)"], ["e^x * e^y", "e^(x+y)"], ["1/x + 1/y", "(x+y)/(x*y)"], ["b + a", "a + b"], ["2*x + 2*y", "2*(x+y)"], ["x/2", "x * (1/2)"], ["x^2 * x^3", "x^5"], ["(x^2)^3", "x^6"], ["log(e^x)", "x"], ["e^(log(x))", "x"], ["n*(n+1)/2", "n/2 + n^2/2"], ["1 + x + x^2", "(x^3 - 1)/(x-1)"], ["cos(2*x)", "1 - 2*sin(x)^2"], ["tan(x)", "sin(x)/cos(x)"], ["cosh(x)^2 - sinh(x)^2","1"] ], "rewriting_groups": [ ["x^2 + 2*x + 1", "(x+1)^2", "x*(x+2) + 1"], ["a*b + a*c", "a*(b+c)", "a*c + a*b"], ["sin(x)/cos(x)", "tan(x)", "sin(x)*sec(x)"], ["e^(x+y)", "e^x * e^y"], ["log(x^2)", "2*log(x)","log(x) + log(x)"], ["n*(n+1)/2", "n/2*(n+1)", "sum(k, k, 1, n)"] ], "mixed_text_math": [ "The derivative of $\\sin(x^2)$ with respect to $x$ is $2x\\cos(x^2)$.", "Let $f(x) = x^2 + 2x + 1$. Then $f(x) = (x+1)^2$.", "The quadratic formula gives $x = \\frac{-b \\pm \\sqrt{b^2 - 4ac}}{2a}$.", "Euler's identity states that $e^{i\\pi} + 1 = 0$.", "The integral $\\int_0^1 x^2 dx = \\frac{1}{3}$.", "For any $n \\geq 1$, the sum $\\sum_{k=1}^{n} k = \\frac{n(n+1)}{2}$.", "The Pythagorean theorem: $a^2 + b^2 = c^2$ for right triangles.", "The normal distribution is $f(x) = \\frac{1}{\\sqrt{2\\pi}}e^{-x^2/2}$.", "If $\\sin^2(x) + \\cos^2(x) = 1$ then $\\tan^2(x) + 1 = \\sec^2(x)$.", "The limit $\\lim_{x \\to 0} \\frac{\\sin(x)}{x} = 1$ is fundamental.", "Find the derivative of f(x) = sin(x^2) + 3x.", "Solve for x: x^2 - 5*x + 6 = 0.", "The area of a circle of radius r is pi*r^2.", "Simplify: (a+b)^2 - (a-b)^2.", "Compute the Taylor series of exp(x) around x=0." ], "latex_only": [ "\\frac{x^2 - 1}{x + 1}", "\\sqrt{\\frac{a^2 + b^2}{2}}", "\\int_0^\\infty e^{-x^2} dx", "\\sum_{n=0}^{\\infty} \\frac{x^n}{n!}", "\\lim_{n \\to \\infty} \\left(1 + \\frac{1}{n}\\right)^n", "\\binom{n}{k} = \\frac{n!}{k!(n-k)!}", "\\frac{d}{dx}\\left[\\ln(x)\\right] = \\frac{1}{x}", "\\nabla^2 f = \\frac{\\partial^2 f}{\\partial x^2} + \\frac{\\partial^2 f}{\\partial y^2}" ], "ascii_only": [ "x**2 + 2*x + 1", "sin(x)**2 + cos(x)**2", "exp(-x**2 / 2) / sqrt(2*pi)", "factorial(n) / (factorial(k) * factorial(n - k))", "log(x**2) - 2*log(x)", "abs(a - b) + abs(b - c)", "floor(x/2) * 2", "gamma(n + 1) / gamma(n)" ], "metadata": { "version": "1.0", "description": "MathTok benchmark dataset — curated expressions for evaluating structural tokenization quality", "sources": ["handcrafted", "DeepMind-Mathematics-inspired"], "num_expressions": 30, "num_equivalent_pairs": 20, "num_rewriting_groups": 6, "num_mixed": 15 } }