InternVL3-8B-RoboVQA-Stage1

QLoRA adapter for InternVL3-8B fine-tuned on RoboVQA for robotics visual grounding.

Overview

Base Model OpenGVLab/InternVL3-8B
Training Data 722,979 single-QA samples from RoboVQA
Task Visual grounding (Stage 1 of 2-stage curriculum)
Eval Loss 0.125
Perplexity 1.133

Training

This is Stage 1 of a two-stage curriculum learning approach:

  • Stage 1 (this model): Single QA pairs → visual grounding
  • Stage 2: Multi-turn conversations → reasoning chains

Trained on DGX Spark (GB10 Blackwell, 128GB unified memory) for ~26 days.

Usage

from peft import PeftModel
from transformers import AutoModel

base_model = AutoModel.from_pretrained("OpenGVLab/InternVL3-8B", trust_remote_code=True)
model = PeftModel.from_pretrained(base_model, "agiri123/internvl3-8b-robovqa-stage1")

Links

See the repository README for full training details, hyperparameters, and curriculum validation experiments.

Downloads last month
-
Video Preview
loading

Model tree for agiri123/internvl3-8b-robovqa-stage1