Add model card for GeoFocus

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +44 -0
README.md ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: transformers
4
+ pipeline_tag: image-text-to-text
5
+ datasets:
6
+ - dle666/Global_Perceptor
7
+ - dle666/Local_Perceptor
8
+ - dle666/GeoFocus-test
9
+ base_model: Qwen/Qwen2.5-VL-7B-Instruct
10
+ ---
11
+
12
+ # GeoFocus-7B
13
+
14
+ [GeoFocus](https://huggingface.co/papers/2602.08524) is a specialized Large Multimodal Model (LMM) designed for geometry problem-solving. It enhances global-to-local perception by blending global topology recognition with critical local structure perception.
15
+
16
+ ## Model Description
17
+ Geometry problem-solving remains a significant challenge for LMMs, requiring not only global shape recognition but also attention to intricate local relationships. GeoFocus addresses this through two core modules:
18
+
19
+ 1. **Critical Local Perceptor**: Automatically identifies and emphasizes critical local structures (e.g., angles, parallel lines, comparative distances) through thirteen theory-based perception templates.
20
+ 2. **VertexLang**: A compact topology formal language that encodes global figures through vertex coordinates and connectivity relations, reducing training time while improving accuracy compared to bulky code-based encodings.
21
+
22
+ GeoFocus-7B is built upon the **Qwen2.5-VL-7B** architecture and achieves significant improvements on geometry benchmarks like Geo3K, GeoQA, and FormalGeo7K.
23
+
24
+ - **Paper:** [GeoFocus: Blending Efficient Global-to-Local Perception for Multimodal Geometry Problem-Solving](https://huggingface.co/papers/2602.08524)
25
+ - **Repository:** [GitHub - dle666/GeoFocus](https://github.com/dle666/GeoFocus)
26
+
27
+ ## Training Data
28
+ The model utilizes the following datasets:
29
+ - [Global_Perceptor_Data](https://huggingface.co/datasets/dle666/Global_Perceptor)
30
+ - [Local_Perceptor_Data](https://huggingface.co/datasets/dle666/Local_Perceptor)
31
+ - [GeoFocus-test](https://huggingface.co/datasets/dle666/GeoFocus-test)
32
+
33
+ ## Acknowledgement
34
+ This work benefits from several open-source projects including [Qwen2.5](https://github.com/QwenLM/Qwen2.5), [EasyR1](https://github.com/hiyouga/EasyR1), [verl](https://github.com/volcengine/verl), [NoisyRollout](https://github.com/NUS-TRAIL/NoisyRollout), and [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory).
35
+
36
+ ## Citation
37
+ ```bibtex
38
+ @article{geofocus2026,
39
+ title={GeoFocus: Blending Efficient Global-to-Local Perception for Multimodal Geometry Problem-Solving},
40
+ author={Jiuhai Chen and Jianwei Yang and Haiping Wu and Dianqi Li and Jianfeng Gao and Tianyi Zhou and Bin Xiao},
41
+ journal={arXiv preprint arXiv:2602.08524},
42
+ year={2026}
43
+ }
44
+ ```