SenseNova-SI-InternVL3-8B-mindcube-lora-r16-e1

LoRA fine-tune of sensenova/SenseNova-SI-1.3-InternVL3-8B on the MindCube train split (10,000 examples), merged back into the base weights for easy deployment.

Results on MindCube tinybench (1,050 questions)

Bucket Base (zero-shot) This checkpoint
Overall 85.52% 93.62%
among (600) 92.33% 94.50%
around (250) 84.00% 92.80%
rotation (200) 67.00% 92.00%
linear 84.00% 92.80%
perpendicular 86.00% 93.87%

Training recipe

  • Base: sensenova/SenseNova-SI-1.3-InternVL3-8B (InternVL3-8B with Qwen2.5-7B backbone)
  • Dataset: MindCube train split, 10,000 multiple-choice spatial-reasoning questions with 2-4 images each
  • LoRA: r=16, alpha=32, dropout=0.05, targets = all 196 LLM linears (q/k/v/o/gate/up/down_proj × 28 Qwen2 layers); vision + connector frozen
  • Optim: AdamW, cosine lr=1e-4, warmup 3%, 1 epoch, eff. batch 8 (bs=1, grad_accum=8), BF16, grad-ckpt
  • Compute: 1× H200 on Modal, ~3h13min for 1,250 optimizer steps
  • Max tiles per image: 4; max_seq_len: 7168

Inference

import torch
from transformers import AutoModel, AutoTokenizer

repo = "gdgc-mindcube/SenseNova-SI-InternVL3-8B-mindcube-lora-r16-e1"
tokenizer = AutoTokenizer.from_pretrained(repo, trust_remote_code=True, use_fast=False)
model = AutoModel.from_pretrained(
    repo, torch_dtype=torch.bfloat16, trust_remote_code=True
).eval().cuda()

# Use the InternVL3 chat format: 'Image-1: <image>\nImage-2: <image>\n{question}'
# See base model card for end-to-end multi-image example.

License

Apache-2.0 (inherits from base).

Downloads last month
2
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for gdgc-mindcube/SenseNova-SI-InternVL3-8B-mindcube-lora-r16-e1