zhou777 commited on
Commit
e855cbc
·
verified ·
1 Parent(s): 308814c

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +71 -0
README.md ADDED
@@ -0,0 +1,71 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ - zh
6
+ pipeline_tag: image-text-to-text
7
+ tags:
8
+ - remote-sensing
9
+ - land-use
10
+ - qwen2.5-vl
11
+ - multimodal
12
+ - ms-swift
13
+ ---
14
+
15
+ # LandAI-L1: Explicit geometric grounding enables data-efficient and interpretable geospatial intelligence
16
+
17
+ <div align="center">
18
+
19
+ **[NMI Submission]** &nbsp; | &nbsp; **[Paper (Under Review)]** &nbsp; | &nbsp; **[Dataset]**
20
+
21
+ </div>
22
+
23
+ ## 📖 Introduction
24
+
25
+ **LandAI-L1** is a multimodal large language model designed for **verifiable land-use reasoning**. Unlike traditional black-box classification models, LandAI-L1 enforces a strict cognitive path: **"Visual Indexing、Geometric Localization and Language Reasoning"**.
26
+
27
+ By compelling the model to explicitly localize visual evidence (bounding boxes) before drawing semantic conclusions, we achieve state-of-the-art accuracy in land-use classification while significantly mitigating multimodal hallucinations.
28
+
29
+ This model is built upon the **Qwen2.5-VL-7B-Instruct** architecture and trained using the **GRPO-L1** algorithm.
30
+
31
+ ## 🚀 Key Features
32
+
33
+ - **Explicit Geometric Grounding**: Mitigates "disembodied explanations" by anchoring reasoning steps in verifiable pixel coordinates.
34
+ - **Data Efficiency**: Achieves SOTA performance (86.41% accuracy) using only **25%** of the training data required by comparable models (e.g., LandGPT).
35
+ - **Hallucination Resistance**: Demonstrates superior resistance to text-based misinformation in visual-linguistic conflict scenarios (37.0% vision-adherence vs. 7.3% baseline).
36
+ - **Standardized Architecture**: Fully follows the **Qwen2.5-VL** inference architecture to minimize version conflicts and maximize ecosystem compatibility.
37
+ - **Reproducible Training**: The training phase utilizes the **[ms-swift](https://github.com/modelscope/swift)** framework, facilitating easy fine-tuning and further research.
38
+
39
+ ## 📊 Performance
40
+
41
+ LandAI-L1 establishes a new benchmark on the independent CN-MSLU test set, outperforming both open-source baselines and commercial models.
42
+
43
+ | Model | Architecture | Training Samples | Accuracy (%) | Hallucination Resistance |
44
+ | :--- | :--- | :--- | :--- | :--- |
45
+ | **LandAI-L1 (Ours)** | **Qwen2.5-VL-7B** | **~20k** | **86.41** | **High** |
46
+ | LandAI-L1-Zero (Baseline) | Qwen2.5-VL-7B | ~20k | 72.21 | Low |
47
+ | LandGPT | InternVL2 | ~80k | 82.5 (approx) | Low |
48
+ | Gemini 2.5 Pro | Closed | N/A | 52.21 | Medium |
49
+
50
+ > **Note**: Hallucination resistance refers to the model's ability to reject misleading textual priors in favor of visual evidence (Visual-Linguistic Conflict Experiment).
51
+
52
+ ## 🛠️ Quick Start
53
+
54
+ Since LandAI-L1 strictly follows the **Qwen2.5-VL** architecture, you can load it directly using `transformers` without custom modeling code.
55
+
56
+ ### Installation
57
+
58
+ ```bash
59
+ pip install git+https://github.com/huggingface/transformers
60
+ pip install qwen-vl-utils
61
+ ```
62
+ ## ⚙️ Training & Fine-tuning
63
+ The model was trained using **[ms-swift](https://github.com/modelscope/swift)**, a lightweight and extensible framework for LLM/MLLM fine-tuning.
64
+
65
+ To reproduce the training or fine-tune on your own geospatial data:
66
+
67
+ Clone ms-swift: git clone https://github.com/modelscope/swift.git
68
+
69
+ Prepare your dataset in the standard format.
70
+
71
+ Run the training ms-swift script.