LandAI-Base / README.md

Update README.md

8d8e4b0 verified 2 days ago

3.24 kB

	---
	license: apache-2.0
	language:
	- en
	- zh
	pipeline_tag: image-text-to-text
	tags:
	- remote-sensing
	- geospatial-reasoning
	- qwen2.5-vl
	- sft
	- chain-of-thought
	- ms-swift
	---

	# LandAI-Base: Activating Geospatial Chain-of-Thought Reasoning

	<div align="center">

	[Paper (Under Review)]

	</div>

	## 📖 Introduction

	LandAI-Base is the foundational Supervised Fine-Tuned (SFT) model of the LandAI family. Built upon the Qwen2.5-VL-7B-Instruct architecture, it is specifically designed to activate domain-specific logical reasoning in Earth Observation tasks.

	Unlike general-purpose multimodal models, LandAI-Base has been fine-tuned on a composite corpus of approximately 334,000 reasoning chains, including the novel Geo-Base-Thinking-14K dataset. This process instills the model with the "epistemological authority" of geography experts, enabling it to decompose complex spatial problems before engaging in visual recognition.

	LandAI-Base serves two primary purposes:
	1. A robust baseline for geospatial reasoning tasks (Q&A, analysis).
	2. The "Cold Start" initialization for the [LandAI-L1](https://huggingface.co/zhou777/LandAI-L1) model (trained via GRPO-L1).

	## 🚀 Key Features

	* Domain-Specific Cognitive Activation: Fine-tuned to simulate the reasoning patterns of geography experts, moving from rote memorization to logical deduction.
	* High-Quality Training Data: Trained on a curated mix of:
	* Geo-Base-Thinking-14K: ~14.7k distillations from geography entrance exams and textbooks.
	* General Reasoning Corpus: Subsets from OpenR1-Math, OpenThoughts, and Chinese-Data-R1 to enhance mathematical and scientific logic.
	* Strong Zero-Shot Performance: Significantly outperforms the vanilla Qwen2.5-VL-7B on geographic benchmark exams.
	* MS-Swift Compatibility: Fully compatible with the [ms-swift](https://github.com/modelscope/swift) training framework.

	## 📊 Performance Benchmarks

	LandAI-Base demonstrates a substantial leap in reasoning capabilities compared to its backbone model. In the GeoTest2025 benchmark (derived from restricted 2025 National Postgraduate Entrance Examination questions), it achieves near-commercial performance.

	\| Model \| GeoTest2025 (Geography) \| AIME 2024 \| HumanEval \| MMMU pro \|
	\| :--- \| :---: \| :---: \| :---: \| :---: \|
	\| LandAI-Base-7B (Ours) \| 93.3% \| 16.7% \| 66.4% \| 44.7% \|
	\| Qwen2.5-VL-7B (Baseline) \| 46.7% \| 3.3% \| 67.3% \| 41.2% \|
	\| GPT-4o \| 92.1% \| 9.3% \| 90.2% \| 51.9% \|
	\| Gemini 2.5 Pro \| 98.3% \| 92.0% \| - \| 71.2% \|



	## 📂 Dataset Composition

	The explicit reasoning capability of LandAI-Base stems from its training data distribution:

	\| Dataset Source \| Samples \| Purpose \|
	\| :--- \| :--- \| :--- \|
	\| Geo-Base-Thinking-14K \| ~14.7k \| Domain-specific geospatial logic & knowledge \|
	\| OpenR1-Math \| ~96k \| Mathematical reasoning infrastructure \|
	\| OpenThoughts \| ~114k \| General scientific literacy (Physics/Chem/Bio) \|
	\| Chinese-Data-R1 \| ~110k \| Linguistic nuance and logic bridging \|

	## 🛠️ Quick Start

	LandAI-Base follows the standard Qwen2.5-VL architecture. You can use it for geospatial Question Answering or as a base for further RL training.

	---
	license: apache-2.0
	language:
	- en
	- zh
	pipeline_tag: image-text-to-text
	tags:
	- remote-sensing
	- geospatial-reasoning
	- qwen2.5-vl
	- sft
	- chain-of-thought
	- ms-swift
	---

	# LandAI-Base: Activating Geospatial Chain-of-Thought Reasoning

	<div align="center">

	[Paper (Under Review)]

	</div>

	## 📖 Introduction

	LandAI-Base is the foundational Supervised Fine-Tuned (SFT) model of the LandAI family. Built upon the Qwen2.5-VL-7B-Instruct architecture, it is specifically designed to activate domain-specific logical reasoning in Earth Observation tasks.

	Unlike general-purpose multimodal models, LandAI-Base has been fine-tuned on a composite corpus of approximately 334,000 reasoning chains, including the novel Geo-Base-Thinking-14K dataset. This process instills the model with the "epistemological authority" of geography experts, enabling it to decompose complex spatial problems before engaging in visual recognition.

	LandAI-Base serves two primary purposes:
	1. A robust baseline for geospatial reasoning tasks (Q&A, analysis).
	2. The "Cold Start" initialization for the [LandAI-L1](https://huggingface.co/zhou777/LandAI-L1) model (trained via GRPO-L1).

	## 🚀 Key Features

	* Domain-Specific Cognitive Activation: Fine-tuned to simulate the reasoning patterns of geography experts, moving from rote memorization to logical deduction.
	* High-Quality Training Data: Trained on a curated mix of:
	* Geo-Base-Thinking-14K: ~14.7k distillations from geography entrance exams and textbooks.
	* General Reasoning Corpus: Subsets from OpenR1-Math, OpenThoughts, and Chinese-Data-R1 to enhance mathematical and scientific logic.
	* Strong Zero-Shot Performance: Significantly outperforms the vanilla Qwen2.5-VL-7B on geographic benchmark exams.
	* MS-Swift Compatibility: Fully compatible with the [ms-swift](https://github.com/modelscope/swift) training framework.

	## 📊 Performance Benchmarks

	LandAI-Base demonstrates a substantial leap in reasoning capabilities compared to its backbone model. In the GeoTest2025 benchmark (derived from restricted 2025 National Postgraduate Entrance Examination questions), it achieves near-commercial performance.

	\| Model \| GeoTest2025 (Geography) \| AIME 2024 \| HumanEval \| MMMU pro \|
	\| :--- \| :---: \| :---: \| :---: \| :---: \|
	\| LandAI-Base-7B (Ours) \| 93.3% \| 16.7% \| 66.4% \| 44.7% \|
	\| Qwen2.5-VL-7B (Baseline) \| 46.7% \| 3.3% \| 67.3% \| 41.2% \|
	\| GPT-4o \| 92.1% \| 9.3% \| 90.2% \| 51.9% \|
	\| Gemini 2.5 Pro \| 98.3% \| 92.0% \| - \| 71.2% \|



	## 📂 Dataset Composition

	The explicit reasoning capability of LandAI-Base stems from its training data distribution:

	\| Dataset Source \| Samples \| Purpose \|
	\| :--- \| :--- \| :--- \|
	\| Geo-Base-Thinking-14K \| ~14.7k \| Domain-specific geospatial logic & knowledge \|
	\| OpenR1-Math \| ~96k \| Mathematical reasoning infrastructure \|
	\| OpenThoughts \| ~114k \| General scientific literacy (Physics/Chem/Bio) \|
	\| Chinese-Data-R1 \| ~110k \| Linguistic nuance and logic bridging \|

	## 🛠️ Quick Start

	LandAI-Base follows the standard Qwen2.5-VL architecture. You can use it for geospatial Question Answering or as a base for further RL training.