sy1998
/

EarthMind-4B

+---
+license: apache-2.0
+library_name: transformers
+pipeline_tag: image-text-to-text
+---
+# EarthMind: Towards Multi-Granular and Multi-Sensor Earth Observation with Large Multimodal Models
+This repository contains the EarthMind-4B model,  a novel vision-language framework for multi-granular and multi-sensor Earth Observation (EO) data understanding, as presented in the paper [EarthMind: Towards Multi-Granular and Multi-Sensor Earth Observation with Large Multimodal Models](https://huggingface.co/papers/2506.01667).  EarthMind features Spatial Attention Prompting (SAP) and Cross-modal Fusion for enhanced EO data understanding.
+**Code:** [https://github.com/sy1998/EarthMind](https://github.com/sy1998/EarthMind)
+**Sample Usage:** (see GitHub README for detailed instructions)
+```python
+import argparse
+import os
+from PIL import Image
+from transformers import AutoModelForCausalLM, AutoTokenizer
+import cv2
+try:
+    from mmengine.visualization import Visualizer
+except ImportError:
+    Visualizer = None
+    print("Warning: mmengine is not installed, visualization is disabled.")
+# ... (rest of the sample code from GitHub README) ...
+```
+**Tags:** image-text-to-text, earth-observation, multi-modal, vision-language