GeoFocus-3B / README.md
nielsr's picture
nielsr HF Staff
Add model card for GeoFocus-3B
c6542b1 verified
|
raw
history blame
2 kB
metadata
pipeline_tag: image-text-to-text
library_name: transformers
tags:
  - geometry
  - math
  - vision-language
  - reasoning

GeoFocus-3B

GeoFocus is a framework designed to enhance multimodal geometry reasoning in Large Multimodal Models (LMMs). It addresses the challenges of geometry problem-solving by focusing on both global shape recognition and intricate local relationships through two core modules:

  1. Critical Local Perceptor: Automatically identifies and emphasizes critical local structures (e.g., angles, parallel lines, comparative distances) using theory-based perception templates.
  2. VertexLang: A compact topology formal language that encodes global figures through vertex coordinates and connectivity relations, improving efficiency and accuracy compared to traditional code-based encodings.

Model Details

Evaluation Results

GeoFocus demonstrates significant improvements over specialized models across major geometry benchmarks:

Model Name Geo3K GeoQA Formalgeo7k
GeoFocus-3B 50.4 64.3 55.4
GeoFocus-7B 55.3 71.9 63.5

Environment and Installation

To use this model, ensure you have the following requirements installed:

  • Python 3.9+
  • transformers>=4.51.0
  • flash-attn>=2.4.3
  • vllm>=0.8.3
pip install transformers>=4.51.0 flash-attn>=2.4.3 vllm>=0.8.3

Citation

@article{chen2026geofocus,
  title={GeoFocus: Blending Efficient Global-to-Local Perception for Multimodal Geometry Problem-Solving},
  author={Jiuhai Chen and Jianwei Yang and Haiping Wu and Dianqi Li and Jianfeng Gao and Tianyi Zhou and Bin Xiao},
  journal={arXiv preprint arXiv:2602.08524},
  year={2026}
}