| | --- |
| | pipeline_tag: image-text-to-text |
| | library_name: transformers |
| | tags: |
| | - geometry |
| | - math |
| | - vision-language |
| | - reasoning |
| | --- |
| | |
| | # GeoFocus-3B |
| |
|
| | GeoFocus is a framework designed to enhance multimodal geometry reasoning in Large Multimodal Models (LMMs). It addresses the challenges of geometry problem-solving by focusing on both global shape recognition and intricate local relationships through two core modules: |
| |
|
| | 1. **Critical Local Perceptor**: Automatically identifies and emphasizes critical local structures (e.g., angles, parallel lines, comparative distances) using theory-based perception templates. |
| | 2. **VertexLang**: A compact topology formal language that encodes global figures through vertex coordinates and connectivity relations, improving efficiency and accuracy compared to traditional code-based encodings. |
| |
|
| | ## Model Details |
| | - **Architecture:** Based on Qwen2.5-VL |
| | - **Paper:** [GeoFocus: Blending Efficient Global-to-Local Perception for Multimodal Geometry Problem-Solving](https://huggingface.co/papers/2602.08524) |
| | - **Repository:** [GitHub - dle666/GeoFocus](https://github.com/dle666/GeoFocus) |
| |
|
| | ## Evaluation Results |
| |
|
| | GeoFocus demonstrates significant improvements over specialized models across major geometry benchmarks: |
| |
|
| | | Model Name | Geo3K | GeoQA | Formalgeo7k | |
| | | :---: | :---: | :---: | :---: | |
| | | **GeoFocus-3B** | 50.4 | 64.3 | 55.4 | |
| | | **GeoFocus-7B** | 55.3 | 71.9 | 63.5 | |
| |
|
| | ## Environment and Installation |
| |
|
| | To use this model, ensure you have the following requirements installed: |
| |
|
| | - Python 3.9+ |
| | - transformers>=4.51.0 |
| | - flash-attn>=2.4.3 |
| | - vllm>=0.8.3 |
| |
|
| | ```bash |
| | pip install transformers>=4.51.0 flash-attn>=2.4.3 vllm>=0.8.3 |
| | ``` |
| |
|
| | ## Citation |
| |
|
| | ```bibtex |
| | @article{chen2026geofocus, |
| | title={GeoFocus: Blending Efficient Global-to-Local Perception for Multimodal Geometry Problem-Solving}, |
| | author={Jiuhai Chen and Jianwei Yang and Haiping Wu and Dianqi Li and Jianfeng Gao and Tianyi Zhou and Bin Xiao}, |
| | journal={arXiv preprint arXiv:2602.08524}, |
| | year={2026} |
| | } |
| | ``` |