LandAI-L1-Zero: Ungrounded Baseline for Geospatial Reasoning
[Experimental Baseline] | [Paper (Under Review)] | Main Model: LandAI-L1
β οΈ Warning: Experimental Research Artifact
This model (
LandAI-L1-Zero) is the ungrounded baseline version referenced in our NMI submission. It was trained using standard Reinforcement Learning (GRPO) without explicit geometric constraints.Expect instability: This model is prone to multimodal hallucinations and may prioritize textual priors over visual evidence. For the stable, state-of-the-art version, please use LandAI-L1.
π Introduction
LandAI-L1-Zero serves as the scientific control group for the LandAI research project. It was developed to rigorously evaluate the impact of the "Geometric Grounding" mechanism introduced in our main model.
Unlike the final LandAI-L1 model, this version:
- Lacks Geometric Constraints: It attempts to reason about land use without first identifying/bounding visual features.
- Standard GRPO: It is trained using the standard Group Relative Policy Optimization algorithm, optimizing primarily for textual fluency and semantic correctness.
Research Purpose: This model is released to ensure reproducibility and to allow the community to analyze the "what-if" scenario: What happens when we train a geospatial reasoning model without forcing it to "look" before it "thinks"?
π Performance Comparison
As detailed in our paper, the lack of geometric grounding leads to a significant performance gap compared to the main model.
| Model | Variant | Training Strategy | Accuracy (Test Set) | Hallucination Risk |
|---|---|---|---|---|
| LandAI-L1 | Main Model | GRPO-L1 (Grounded) | 86.41% | Low |
| LandAI-L1-Zero | Baseline (This Model) | Standard GRPO (Ungrounded) | 72.21% | High |
Note: The performance gap (approx. 14%) highlights the critical role of the "Visual Indexing -> Geometric Localization -> Language Reasoning" cognitive path.
π§ Known Limitations & Instability
Users should be aware of the following behaviors, which are expected in this experimental version:
- Multimodal Hallucination: The model may generate plausible-sounding reasoning that contradicts the actual satellite imagery (e.g., describing a "Commercial Zone" based on text prompts even when the image clearly shows "Agriculture").
- Unstable Reasoning Chains: The Chain-of-Thought (CoT) generation may become verbose, circular, or drift into irrelevant topics, known as the "unstable plateau" phenomenon described in our paper.
- Textual Bias: It exhibits a high susceptibility to misleading textual priors (e.g., incorrect POI data).
π οΈ Usage (Research Only)
This model shares the same Qwen2.5-VL-7B-Instruct architecture as the main model, ensuring compatibility with standard inference pipelines.
from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
import torch
model_path = "Your-HF-Username/LandAI-L1-Zero"
# Load model
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
model_path,
torch_dtype=torch.bfloat16,
device_map="auto"
)
processor = AutoProcessor.from_pretrained(model_path)
# Note: Outputs from this model should be treated as experimental baselines.
- Downloads last month
- 16