EO-Gym-4B

EO-Gym-4B is a LoRA adapter for Qwen/Qwen3-VL-4B-Instruct, fine-tuned for Earth-observation visual question answering and tool-use style reasoning with the EO-Gym dataset.

This repository contains adapter weights only. Load it with the base model Qwen/Qwen3-VL-4B-Instruct; it is not a standalone full checkpoint.

Model Details

Model type: PEFT LoRA adapter for a multimodal causal language model.
Base model: Qwen/Qwen3-VL-4B-Instruct.
Adapter method: LoRA.
LoRA rank: 16.
LoRA alpha: 32.
LoRA dropout: 0.05.
Target modules: Qwen3-VL language-model projection layers matching q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, and down_proj.
Training dtype: bfloat16.
PEFT version: 0.18.0.
Primary dataset: paperuploadacount/EO-Gym.

Intended Use

The adapter is intended for research on Earth-observation multimodal question answering, remote-sensing image interpretation, and EO-Gym tool-augmented reasoning workflows.

It is not intended for safety-critical geospatial decisions, emergency response, legal determinations, or fully automated operational monitoring without human review and task-specific validation.

Quick Start

Install current Hugging Face and PEFT packages compatible with Qwen3-VL:

pip install "transformers>=4.57" "peft>=0.18" qwen-vl-utils decord accelerate

Load the adapter with the base model:

import torch
from peft import PeftModel
from transformers import AutoModelForImageTextToText, AutoProcessor

base_model_id = "Qwen/Qwen3-VL-4B-Instruct"
adapter_id = "paperuploadacount/EO-Gym-4B"

processor = AutoProcessor.from_pretrained(base_model_id)
model = AutoModelForImageTextToText.from_pretrained(
    base_model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter_id)
model.eval()

For EO-Gym evaluation, use this adapter with the EO-Gym tool server and the dataset examples in paperuploadacount/EO-Gym.

Training Data

The adapter was fine-tuned on EO-Gym examples, a dataset of Earth-observation visual question-answering and tool-use interactions. The dataset repository is paperuploadacount/EO-Gym.

The uploaded dataset card configures the primary JSONL splits:

datasets/eo_gym_train_set.jsonl: training split.
datasets/eo_gym_test_set.jsonl: test split.

Training Procedure

Key training settings recorded from the local training run:

Training type: LoRA supervised fine-tuning.
Epochs: 3.
Per-device train batch size: 4.
Gradient accumulation steps: 16.
Learning rate: 1e-4.
LR scheduler: cosine.
Weight decay: 0.1.
Max sequence length: 8192.
Max image pixels: 1,003,520.
Optimizer: adamw_torch_fused.
Vision tower: frozen.
Aligner: frozen.
Seed: 42.

The model card intentionally omits local filesystem paths and private training checkpoint locations.

Evaluation

The final validation record from the training run reports:

Step: 273.
Epoch: 3.0.
Validation loss: 0.16282493.
Validation token accuracy: 0.96108994.

These numbers are training-run validation metrics, not a full external benchmark. For reproducible downstream reporting, evaluate against the committed EO-Gym test split with the EO-Gym tool server and your selected inference backend.

Limitations

The adapter inherits the capabilities and limitations of Qwen/Qwen3-VL-4B-Instruct.
Performance depends on prompt format, image preprocessing, and availability of the EO-Gym tool server for tool-augmented workflows.
Remote-sensing tasks can be sensitive to resolution, projection, bands, cloud cover, and temporal coverage; validate outputs against authoritative sources.
The adapter should be treated as a research artifact until independently evaluated for a specific deployment setting.

Files

adapter_model.safetensors: LoRA adapter weights.
adapter_config.json: PEFT adapter configuration.
additional_config.json: training framework adapter metadata.
.gitattributes: LFS hints for binary model artifacts.
.hfignore: excludes optimizer and trainer state from Hub uploads.

Citation

No paper citation is provided for this checkpoint. If you use the adapter, please cite the EO-Gym dataset and the Qwen3-VL base model according to their respective citation guidance.

Downloads last month: 10

Inference Providers NEW

Image-Text-to-Text

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for paperuploadacount/EO-Gym-4B

Base model

Qwen/Qwen3-VL-4B-Instruct

Adapter

(52)

this model

paperuploadacount
/

EO-Gym-4B