LandAI-Base / README.md

zhou777

Update README.md

8d8e4b0 verified 2 days ago

preview code

raw

history blame contribute delete

3.24 kB

metadata

license: apache-2.0
language:
  - en
  - zh
pipeline_tag: image-text-to-text
tags:
  - remote-sensing
  - geospatial-reasoning
  - qwen2.5-vl
  - sft
  - chain-of-thought
  - ms-swift

LandAI-Base: Activating Geospatial Chain-of-Thought Reasoning

[Paper (Under Review)]

📖 Introduction

LandAI-Base is the foundational Supervised Fine-Tuned (SFT) model of the LandAI family. Built upon the Qwen2.5-VL-7B-Instruct architecture, it is specifically designed to activate domain-specific logical reasoning in Earth Observation tasks.

Unlike general-purpose multimodal models, LandAI-Base has been fine-tuned on a composite corpus of approximately 334,000 reasoning chains, including the novel Geo-Base-Thinking-14K dataset. This process instills the model with the "epistemological authority" of geography experts, enabling it to decompose complex spatial problems before engaging in visual recognition.

LandAI-Base serves two primary purposes:

A robust baseline for geospatial reasoning tasks (Q&A, analysis).
The "Cold Start" initialization for the LandAI-L1 model (trained via GRPO-L1).

🚀 Key Features

Domain-Specific Cognitive Activation: Fine-tuned to simulate the reasoning patterns of geography experts, moving from rote memorization to logical deduction.
High-Quality Training Data: Trained on a curated mix of:
- Geo-Base-Thinking-14K: ~14.7k distillations from geography entrance exams and textbooks.
- General Reasoning Corpus: Subsets from OpenR1-Math, OpenThoughts, and Chinese-Data-R1 to enhance mathematical and scientific logic.
Strong Zero-Shot Performance: Significantly outperforms the vanilla Qwen2.5-VL-7B on geographic benchmark exams.
MS-Swift Compatibility: Fully compatible with the ms-swift training framework.

📊 Performance Benchmarks

LandAI-Base demonstrates a substantial leap in reasoning capabilities compared to its backbone model. In the GeoTest2025 benchmark (derived from restricted 2025 National Postgraduate Entrance Examination questions), it achieves near-commercial performance.

Model	GeoTest2025 (Geography)	AIME 2024	HumanEval	MMMU pro
LandAI-Base-7B (Ours)	93.3%	16.7%	66.4%	44.7%
Qwen2.5-VL-7B (Baseline)	46.7%	3.3%	67.3%	41.2%
GPT-4o	92.1%	9.3%	90.2%	51.9%
Gemini 2.5 Pro	98.3%	92.0%	-	71.2%

📂 Dataset Composition

The explicit reasoning capability of LandAI-Base stems from its training data distribution:

Dataset Source	Samples	Purpose
Geo-Base-Thinking-14K	~14.7k	Domain-specific geospatial logic & knowledge
OpenR1-Math	~96k	Mathematical reasoning infrastructure
OpenThoughts	~114k	General scientific literacy (Physics/Chem/Bio)
Chinese-Data-R1	~110k	Linguistic nuance and logic bridging

🛠️ Quick Start

LandAI-Base follows the standard Qwen2.5-VL architecture. You can use it for geospatial Question Answering or as a base for further RL training.