--- pipeline_tag: image-text-to-text library_name: transformers tags: - geometry - math - vision-language - reasoning --- # GeoFocus-3B GeoFocus is a framework designed to enhance multimodal geometry reasoning in Large Multimodal Models (LMMs). It addresses the challenges of geometry problem-solving by focusing on both global shape recognition and intricate local relationships through two core modules: 1. **Critical Local Perceptor**: Automatically identifies and emphasizes critical local structures (e.g., angles, parallel lines, comparative distances) using theory-based perception templates. 2. **VertexLang**: A compact topology formal language that encodes global figures through vertex coordinates and connectivity relations, improving efficiency and accuracy compared to traditional code-based encodings. ## Model Details - **Architecture:** Based on Qwen2.5-VL - **Paper:** [GeoFocus: Blending Efficient Global-to-Local Perception for Multimodal Geometry Problem-Solving](https://huggingface.co/papers/2602.08524) - **Repository:** [GitHub - dle666/GeoFocus](https://github.com/dle666/GeoFocus) ## Evaluation Results GeoFocus demonstrates significant improvements over specialized models across major geometry benchmarks: | Model Name | Geo3K | GeoQA | Formalgeo7k | | :---: | :---: | :---: | :---: | | **GeoFocus-3B** | 50.4 | 64.3 | 55.4 | | **GeoFocus-7B** | 55.3 | 71.9 | 63.5 | ## Environment and Installation To use this model, ensure you have the following requirements installed: - Python 3.9+ - transformers>=4.51.0 - flash-attn>=2.4.3 - vllm>=0.8.3 ```bash pip install transformers>=4.51.0 flash-attn>=2.4.3 vllm>=0.8.3 ``` ## Citation ```bibtex @article{chen2026geofocus, title={GeoFocus: Blending Efficient Global-to-Local Perception for Multimodal Geometry Problem-Solving}, author={Jiuhai Chen and Jianwei Yang and Haiping Wu and Dianqi Li and Jianfeng Gao and Tianyi Zhou and Bin Xiao}, journal={arXiv preprint arXiv:2602.08524}, year={2026} } ```