LandAI-L1: Explicit geometric grounding enables data-efficient and interpretable geospatial reasoning

[Paper (Under Review)] | [Dataset]

📖 Introduction

LandAI-L1 is a multimodal large language model designed for verifiable land-use reasoning. Unlike traditional black-box classification models, LandAI-L1 enforces a strict cognitive path: "Visual Indexing、Geometric Localization and Language Reasoning".

By compelling the model to explicitly localize visual evidence (bounding boxes) before drawing semantic conclusions, we achieve state-of-the-art accuracy in land-use classification while significantly mitigating multimodal hallucinations.

This model is built upon the Qwen2.5-VL-7B-Instruct architecture and trained using the GRPO-L1 algorithm.

🚀 Key Features

Explicit Geometric Grounding: Mitigates "disembodied explanations" by anchoring reasoning steps in verifiable pixel coordinates.
Data Efficiency: Achieves SOTA performance (86.41% accuracy) using only 25% of the training data required by comparable models (e.g., LandGPT).
Hallucination Resistance: Demonstrates superior resistance to text-based misinformation in visual-linguistic conflict scenarios (37.0% vision-adherence vs. 7.3% baseline).
Standardized Architecture: Fully follows the Qwen2.5-VL inference architecture to minimize version conflicts and maximize ecosystem compatibility.
Reproducible Training: The training phase utilizes the ms-swift framework, facilitating easy fine-tuning and further research.

📊 Performance

LandAI-L1 establishes a new benchmark on the independent CN-MSLU test set, outperforming both open-source baselines and commercial models.

Model	Architecture	Training Samples	Accuracy (%)	Hallucination Resistance
LandAI-L1 (Ours)	Qwen2.5-VL-7B	~20k	86.41	High
LandAI-L1-Zero (Baseline)	Qwen2.5-VL-7B	~20k	72.21	Low
LandGPT	InternVL2	~80k	82.5 (approx)	Low
Gemini 2.5 Pro	Closed	N/A	52.21	Medium

Note: Hallucination resistance refers to the model's ability to reject misleading textual priors in favor of visual evidence (Visual-Linguistic Conflict Experiment).

🛠️ Quick Start

Since LandAI-L1 strictly follows the Qwen2.5-VL architecture, you can load it directly using transformers without custom modeling code.

Installation

pip install git+https://github.com/huggingface/transformers
pip install qwen-vl-utils

⚙️ Training & Fine-tuning

The model was trained using ms-swift, a lightweight and extensible framework for LLM/MLLM fine-tuning.

To reproduce the training or fine-tune on your own geospatial data:

Clone ms-swift: git clone https://github.com/modelscope/swift.git

Prepare your dataset in the standard format.

Run the training ms-swift script.

Downloads last month: 2

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for zhou777/LandAI-L1

Quantizations

2 models