Spaces:
Sleeping
Sleeping
| import re | |
| def normalize(value: str) -> str: | |
| """Lowercase, strip, collapse all separators (space/_/-/slash) to one space. | |
| Collapsing ``/`` with the other separators means ``tau-bench-2/airline`` | |
| and ``tau-bench-2_airline`` normalize identically — critical for generalized | |
| ingestion where the same benchmark may appear with slash, underscore, or | |
| hyphen separators across configs. False merges across distinct canonical | |
| IDs are prevented by fuzzy's suffix-stripping being the *only* stem rewrite | |
| we apply (no generic similarity). | |
| Dots between digits are converted to spaces first so that version | |
| numbers like ``4.5`` and ``4-5`` normalize identically (both → ``4 5``). | |
| ``+`` is preserved because it carries identity-relevant meaning for some | |
| benchmarks (``MBPP+`` / ``HumanEval+`` are distinct from their non-``+`` | |
| counterparts; stripping it collides them in the normalized index and | |
| last-write-wins assigns the wrong canonical). | |
| """ | |
| value = value.lower() | |
| value = value.strip() | |
| # Convert dots between digits to spaces (e.g. "4.5" → "4 5") | |
| value = re.sub(r"(?<=\d)\.(?=\d)", " ", value) | |
| value = re.sub(r"[^\w\s\-/+]", "", value) # remove punctuation, keep + | |
| value = re.sub(r"[\s_\-/]+", " ", value).strip() # collapse separators | |
| return value | |