---
pipeline_tag: image-text-to-text
library_name: transformers
tags:
- geometry
- math
- vision-language
- reasoning
---

# GeoFocus-3B

GeoFocus is a framework designed to enhance multimodal geometry reasoning in Large Multimodal Models (LMMs). It addresses the challenges of geometry problem-solving by focusing on both global shape recognition and intricate local relationships through two core modules:

1.  **Critical Local Perceptor**: Automatically identifies and emphasizes critical local structures (e.g., angles, parallel lines, comparative distances) using theory-based perception templates.
2.  **VertexLang**: A compact topology formal language that encodes global figures through vertex coordinates and connectivity relations, improving efficiency and accuracy compared to traditional code-based encodings.

## Model Details
- **Architecture:** Based on Qwen2.5-VL
- **Paper:** [GeoFocus: Blending Efficient Global-to-Local Perception for Multimodal Geometry Problem-Solving](https://huggingface.co/papers/2602.08524)
- **Repository:** [GitHub - dle666/GeoFocus](https://github.com/dle666/GeoFocus)

## Evaluation Results

GeoFocus demonstrates significant improvements over specialized models across major geometry benchmarks:

| Model Name | Geo3K | GeoQA | Formalgeo7k |
| :---: | :---: | :---: | :---: |
| **GeoFocus-3B** | 50.4 | 64.3 | 55.4 |
| **GeoFocus-7B** | 55.3 | 71.9 | 63.5 |

## Environment and Installation

To use this model, ensure you have the following requirements installed:

- Python 3.9+
- transformers>=4.51.0
- flash-attn>=2.4.3
- vllm>=0.8.3

```bash
pip install transformers>=4.51.0 flash-attn>=2.4.3 vllm>=0.8.3
```

## Citation

```bibtex
@article{chen2026geofocus,
  title={GeoFocus: Blending Efficient Global-to-Local Perception for Multimodal Geometry Problem-Solving},
  author={Jiuhai Chen and Jianwei Yang and Haiping Wu and Dianqi Li and Jianfeng Gao and Tianyi Zhou and Bin Xiao},
  journal={arXiv preprint arXiv:2602.08524},
  year={2026}
}
```