InternVL 3.5 v4 Fine-tuned (internvl_35_v4_20260303)

InternVL3-8B 기반 LoRA fine-tuning 후 merge된 full weight 모델입니다.

Model Info

Item	Value
Architecture	InternVLChatModel (InternViT-300M + Qwen3-8B)
Base Model	OpenGVLab/InternVL3-8B
Fine-tuning	LoRA merged (full weight)
Precision	bfloat16
Template	internvl2_5

Benchmark Results

Benchmark	v4 Fine-tuned	InternVL3-8B (Baseline)	Diff
DocVQA_VAL	91.49	91.6	-0.11
ScienceQA_TEST	94.70	93.8	+0.90
ChartQA_TEST	83.48	84.9	-1.42
OCRBench	781	794	-13
MMBench_DEV_EN	81.01	81.8	-0.79
SEEDBench_IMG	75.48	76.1	-0.62
POPE	86.03	88.3	-2.27
TextVQA_VAL	74.51	79.1	-4.59
GQA	58.51	62.2	-3.69
MME (Total)	2026.7	2229.0	-202.3

Baseline numbers from InternVL3-8B OpenGVLab official report

OCRBench Detail

Category	Score
Text Recognition	226
Scene Text-centric VQA	173
Doc-oriented VQA	164
Key Information Extraction	181
Handwritten Math Expression Recognition	37
Final Score	781

ChartQA Detail

Split	Accuracy
test_human	75.28
test_augmented	91.68
Overall	83.48

MME Detail

Split	Score
Perception	1376.7
Cognition	650.0
Total	2026.7

Quick Start

from transformers import AutoModel, AutoTokenizer
import torch

model_name = "yujuyeon/internvl_35_v4_20260303"
model = AutoModel.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True
).cuda().eval()
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)

VLMEvalKit Evaluation

# vlmeval/config.py에 추가
from functools import partial
from vlmeval.vlm import InternVLChat

"InternVL35-v4-HF": partial(
    InternVLChat,
    model_path="yujuyeon/internvl_35_v4_20260303",
    version="V2.0",
)

cd /path/to/VLMEvalKit

python run.py \
    --model InternVL35-v4-HF \
    --data ChartQA_TEST DocVQA_VAL OCRBench ScienceQA_TEST \
    --work-dir ./outputs/v4_benchmark \
    --verbose