SenseNova-SI-InternVL3-8B-mindcube-lora-r16-e1
LoRA fine-tune of sensenova/SenseNova-SI-1.3-InternVL3-8B on the MindCube train split (10,000 examples), merged back into the base weights for easy deployment.
Results on MindCube tinybench (1,050 questions)
| Bucket | Base (zero-shot) | This checkpoint |
|---|---|---|
| Overall | 85.52% | 93.62% |
| among (600) | 92.33% | 94.50% |
| around (250) | 84.00% | 92.80% |
| rotation (200) | 67.00% | 92.00% |
| linear | 84.00% | 92.80% |
| perpendicular | 86.00% | 93.87% |
Training recipe
- Base:
sensenova/SenseNova-SI-1.3-InternVL3-8B(InternVL3-8B with Qwen2.5-7B backbone) - Dataset: MindCube train split, 10,000 multiple-choice spatial-reasoning questions with 2-4 images each
- LoRA: r=16, alpha=32, dropout=0.05, targets = all 196 LLM linears (q/k/v/o/gate/up/down_proj × 28 Qwen2 layers); vision + connector frozen
- Optim: AdamW, cosine lr=1e-4, warmup 3%, 1 epoch, eff. batch 8 (bs=1, grad_accum=8), BF16, grad-ckpt
- Compute: 1× H200 on Modal, ~3h13min for 1,250 optimizer steps
- Max tiles per image: 4; max_seq_len: 7168
Inference
import torch
from transformers import AutoModel, AutoTokenizer
repo = "gdgc-mindcube/SenseNova-SI-InternVL3-8B-mindcube-lora-r16-e1"
tokenizer = AutoTokenizer.from_pretrained(repo, trust_remote_code=True, use_fast=False)
model = AutoModel.from_pretrained(
repo, torch_dtype=torch.bfloat16, trust_remote_code=True
).eval().cuda()
# Use the InternVL3 chat format: 'Image-1: <image>\nImage-2: <image>\n{question}'
# See base model card for end-to-end multi-image example.
License
Apache-2.0 (inherits from base).
- Downloads last month
- 2
Model tree for gdgc-mindcube/SenseNova-SI-InternVL3-8B-mindcube-lora-r16-e1
Base model
OpenGVLab/InternVL3-8B-Pretrained Finetuned
OpenGVLab/InternVL3-8B-Instruct Finetuned
OpenGVLab/InternVL3-8B Finetuned
sensenova/SenseNova-SI-1.3-InternVL3-8B