InternVL 3.5 v4 Fine-tuned (internvl_35_v4_20260303)
InternVL3-8B 기반 LoRA fine-tuning 후 merge된 full weight 모델입니다.
Model Info
| Item | Value |
|---|---|
| Architecture | InternVLChatModel (InternViT-300M + Qwen3-8B) |
| Base Model | OpenGVLab/InternVL3-8B |
| Fine-tuning | LoRA merged (full weight) |
| Precision | bfloat16 |
| Template | internvl2_5 |
Benchmark Results
| Benchmark | v4 Fine-tuned | InternVL3-8B (Baseline) | Diff |
|---|---|---|---|
| DocVQA_VAL | 91.49 | 91.6 | -0.11 |
| ScienceQA_TEST | 94.70 | 93.8 | +0.90 |
| ChartQA_TEST | 83.48 | 84.9 | -1.42 |
| OCRBench | 781 | 794 | -13 |
| MMBench_DEV_EN | 81.01 | 81.8 | -0.79 |
| SEEDBench_IMG | 75.48 | 76.1 | -0.62 |
| POPE | 86.03 | 88.3 | -2.27 |
| TextVQA_VAL | 74.51 | 79.1 | -4.59 |
| GQA | 58.51 | 62.2 | -3.69 |
| MME (Total) | 2026.7 | 2229.0 | -202.3 |
Baseline numbers from InternVL3-8B OpenGVLab official report
OCRBench Detail
| Category | Score |
|---|---|
| Text Recognition | 226 |
| Scene Text-centric VQA | 173 |
| Doc-oriented VQA | 164 |
| Key Information Extraction | 181 |
| Handwritten Math Expression Recognition | 37 |
| Final Score | 781 |
ChartQA Detail
| Split | Accuracy |
|---|---|
| test_human | 75.28 |
| test_augmented | 91.68 |
| Overall | 83.48 |
MME Detail
| Split | Score |
|---|---|
| Perception | 1376.7 |
| Cognition | 650.0 |
| Total | 2026.7 |
Quick Start
from transformers import AutoModel, AutoTokenizer
import torch
model_name = "yujuyeon/internvl_35_v4_20260303"
model = AutoModel.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
trust_remote_code=True
).cuda().eval()
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
VLMEvalKit Evaluation
# vlmeval/config.py에 추가
from functools import partial
from vlmeval.vlm import InternVLChat
"InternVL35-v4-HF": partial(
InternVLChat,
model_path="yujuyeon/internvl_35_v4_20260303",
version="V2.0",
)
cd /path/to/VLMEvalKit
python run.py \
--model InternVL35-v4-HF \
--data ChartQA_TEST DocVQA_VAL OCRBench ScienceQA_TEST \
--work-dir ./outputs/v4_benchmark \
--verbose
Environment
- GPU: NVIDIA A100 40GB/80GB
- transformers >= 4.51.3 (Qwen3 support required)
- Framework: VLMEvalKit
- Downloads last month
- 82
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for yujuyeon/internvl_35_v4_20260303
Base model
OpenGVLab/InternVL3-8B-Pretrained
Finetuned
OpenGVLab/InternVL3-8B-Instruct
Finetuned
OpenGVLab/InternVL3-8B