Add model card for GeoFocus-3B

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +54 -0
README.md ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ pipeline_tag: image-text-to-text
3
+ library_name: transformers
4
+ tags:
5
+ - geometry
6
+ - math
7
+ - vision-language
8
+ - reasoning
9
+ ---
10
+
11
+ # GeoFocus-3B
12
+
13
+ GeoFocus is a framework designed to enhance multimodal geometry reasoning in Large Multimodal Models (LMMs). It addresses the challenges of geometry problem-solving by focusing on both global shape recognition and intricate local relationships through two core modules:
14
+
15
+ 1. **Critical Local Perceptor**: Automatically identifies and emphasizes critical local structures (e.g., angles, parallel lines, comparative distances) using theory-based perception templates.
16
+ 2. **VertexLang**: A compact topology formal language that encodes global figures through vertex coordinates and connectivity relations, improving efficiency and accuracy compared to traditional code-based encodings.
17
+
18
+ ## Model Details
19
+ - **Architecture:** Based on Qwen2.5-VL
20
+ - **Paper:** [GeoFocus: Blending Efficient Global-to-Local Perception for Multimodal Geometry Problem-Solving](https://huggingface.co/papers/2602.08524)
21
+ - **Repository:** [GitHub - dle666/GeoFocus](https://github.com/dle666/GeoFocus)
22
+
23
+ ## Evaluation Results
24
+
25
+ GeoFocus demonstrates significant improvements over specialized models across major geometry benchmarks:
26
+
27
+ | Model Name | Geo3K | GeoQA | Formalgeo7k |
28
+ | :---: | :---: | :---: | :---: |
29
+ | **GeoFocus-3B** | 50.4 | 64.3 | 55.4 |
30
+ | **GeoFocus-7B** | 55.3 | 71.9 | 63.5 |
31
+
32
+ ## Environment and Installation
33
+
34
+ To use this model, ensure you have the following requirements installed:
35
+
36
+ - Python 3.9+
37
+ - transformers>=4.51.0
38
+ - flash-attn>=2.4.3
39
+ - vllm>=0.8.3
40
+
41
+ ```bash
42
+ pip install transformers>=4.51.0 flash-attn>=2.4.3 vllm>=0.8.3
43
+ ```
44
+
45
+ ## Citation
46
+
47
+ ```bibtex
48
+ @article{chen2026geofocus,
49
+ title={GeoFocus: Blending Efficient Global-to-Local Perception for Multimodal Geometry Problem-Solving},
50
+ author={Jiuhai Chen and Jianwei Yang and Haiping Wu and Dianqi Li and Jianfeng Gao and Tianyi Zhou and Bin Xiao},
51
+ journal={arXiv preprint arXiv:2602.08524},
52
+ year={2026}
53
+ }
54
+ ```