| # Earthmind-R1 | |
| EarthMind-4B fine-tuned with GRPO (Group Relative Policy Optimization) for geospatial visual question answering. | |
| ## Model Details | |
| - **Base Model**: EarthMind-4B (InternVL-based architecture) | |
| - **Training Method**: GRPO with LoRA adapters | |
| - **Training Data**: Geospatial instruction dataset | |
| - **Output Format**: Chain-of-thought reasoning with `<think>` and `<answer>` tags | |
| ## Usage | |
| ```python | |
| import torch | |
| from transformers import AutoModel, AutoTokenizer | |
| from PIL import Image | |
| model = AutoModel.from_pretrained( | |
| "aadex/Earthmind-R1", | |
| trust_remote_code=True, | |
| torch_dtype=torch.bfloat16, | |
| device_map="auto" | |
| ) | |
| tokenizer = AutoTokenizer.from_pretrained("aadex/Earthmind-R1", trust_remote_code=True) | |
| # Prepare for generation | |
| model.preparing_for_generation(tokenizer=tokenizer, max_new_tokens=512, torch_dtype=torch.bfloat16) | |
| # Load your image | |
| image = Image.open("your_image.jpg") | |
| # Create prompt | |
| question = "Describe what you see in this satellite image." | |
| prompt = f"""User: <image> | |
| {question} First output the thinking process in <think> </think> tags and then output the final answer in <answer> </answer> tags. | |
| Assistant:""" | |
| # Generate (use model's chat method or manual generation) | |
| response = model.chat(tokenizer, pixel_values, question, generation_config) | |
| print(response) | |
| ``` | |
| ## Training | |
| Trained using GRPO with: | |
| - LoRA rank: 16 | |
| - LoRA alpha: 32 | |
| - Learning rate: 5e-6 | |
| - Epochs: 3 | |
| - Reward functions: accuracy, format | |
| ## License | |
| Please refer to the base EarthMind-4B model license. | |