InternVL3-8B-RoboVQA-Stage1
QLoRA adapter for InternVL3-8B fine-tuned on RoboVQA for robotics visual grounding.
Overview
| Base Model | OpenGVLab/InternVL3-8B |
| Training Data | 722,979 single-QA samples from RoboVQA |
| Task | Visual grounding (Stage 1 of 2-stage curriculum) |
| Eval Loss | 0.125 |
| Perplexity | 1.133 |
Training
This is Stage 1 of a two-stage curriculum learning approach:
- Stage 1 (this model): Single QA pairs → visual grounding
- Stage 2: Multi-turn conversations → reasoning chains
Trained on DGX Spark (GB10 Blackwell, 128GB unified memory) for ~26 days.
Usage
from peft import PeftModel
from transformers import AutoModel
base_model = AutoModel.from_pretrained("OpenGVLab/InternVL3-8B", trust_remote_code=True)
model = PeftModel.from_pretrained(base_model, "agiri123/internvl3-8b-robovqa-stage1")
Links
- Repository: github.com/giricme/vlm-ft
- Stage 2 Model: agiri123/internvl3-8b-robovqa-stage2
See the repository README for full training details, hyperparameters, and curriculum validation experiments.
- Downloads last month
- -
Model tree for agiri123/internvl3-8b-robovqa-stage1
Base model
OpenGVLab/InternVL3-8B-Pretrained Finetuned
OpenGVLab/InternVL3-8B-Instruct Finetuned
OpenGVLab/InternVL3-8B