SenseNova-SI-InternVL3-8B-mindcube-lora-r16-e1

LoRA fine-tune of sensenova/SenseNova-SI-1.3-InternVL3-8B on the MindCube train split (10,000 examples), merged back into the base weights for easy deployment.

Results on MindCube tinybench (1,050 questions)

Bucket	Base (zero-shot)	This checkpoint
Overall	85.52%	93.62%
among (600)	92.33%	94.50%
around (250)	84.00%	92.80%
rotation (200)	67.00%	92.00%
linear	84.00%	92.80%
perpendicular	86.00%	93.87%

Training recipe

Base: sensenova/SenseNova-SI-1.3-InternVL3-8B (InternVL3-8B with Qwen2.5-7B backbone)
Dataset: MindCube train split, 10,000 multiple-choice spatial-reasoning questions with 2-4 images each
LoRA: r=16, alpha=32, dropout=0.05, targets = all 196 LLM linears (q/k/v/o/gate/up/down_proj × 28 Qwen2 layers); vision + connector frozen
Optim: AdamW, cosine lr=1e-4, warmup 3%, 1 epoch, eff. batch 8 (bs=1, grad_accum=8), BF16, grad-ckpt
Compute: 1× H200 on Modal, ~3h13min for 1,250 optimizer steps
Max tiles per image: 4; max_seq_len: 7168

Inference

import torch
from transformers import AutoModel, AutoTokenizer

repo = "gdgc-mindcube/SenseNova-SI-InternVL3-8B-mindcube-lora-r16-e1"
tokenizer = AutoTokenizer.from_pretrained(repo, trust_remote_code=True, use_fast=False)
model = AutoModel.from_pretrained(
    repo, torch_dtype=torch.bfloat16, trust_remote_code=True
).eval().cuda()

# Use the InternVL3 chat format: 'Image-1: <image>\nImage-2: <image>\n{question}'
# See base model card for end-to-end multi-image example.