CurveBench: A Benchmark for Exact Topological Reasoning over Nested Jordan Curves
Abstract
CurveBench presents a benchmark for hierarchical topological reasoning using visual inputs, demonstrating significant challenges in exact topology-aware visual reasoning even with advanced models.
We introduce CurveBench, a benchmark for hierarchical topological reasoning from visual input. CurveBench consists of 756 images of pairwise non-intersecting Jordan curves across easy, polygonal, topographic-inspired, maze-like, and dense counting configurations. Each image is annotated with a rooted tree encoding the containment relations between planar regions. We formulate the task as structured prediction: given an image, a model must recover the full rooted containment tree induced by the curves. Despite the visual simplicity of the task, the strongest evaluated model, Gemini 3.1 Pro, achieves only 71.1\% tree-generation accuracy on CurveBench-Easy and 19.1\% on CurveBench-Hard. We further demonstrate benchmark utility through RLVR-style fine-tuning of open-weight vision-language models. Our trained Qwen3-VL-8B model improves over Qwen-3-VL-8B-Thinking from 2.8\% to 33.3\% tree-generation accuracy on CurveBench-Easy, exceeding GPT-5.4 and Claude Opus 4.5 under our evaluation protocol. The remaining gap, especially on CurveBench-Hard, shows that exact topology-aware visual reasoning remains far from solved.
Community
We introduce CurveBench, a benchmark for testing whether vision-language models can recover hierarchical region-containment trees from images of non-intersecting Jordan curves. The task targets visual topology and structured reasoning beyond simple object recognition, counting, or OCR.
The Hugging Face collection includes the paper, the CurveBench and CurveBench-Easy datasets, evaluation code, ground-truth generation resources, and fine-tuning artifacts. Our results show that even strong frontier VLMs struggle substantially on the harder settings, while fine-tuned open models improve but remain far from solving the task.
Get this paper in your agent:
hf papers read 2605.14068 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 2
AmirMohseni/CurveBench-Easy
Spaces citing this paper 0
No Space linking this paper