4DThinker Model Checkpoints

This repository contains the trained model checkpoints from Qwen2.5-VL-3B for 4DThinker, a framework that enables VLMs to "think with 4D" through dynamic latent mental imagery.

Model Structure

model/
β”œβ”€β”€ dift/
β”‚   β”œβ”€β”€ checkpoints/          # DIFT-stage model weights
β”‚   β”‚   β”œβ”€β”€ model-00001-of-00002.safetensors
β”‚   β”‚   β”œβ”€β”€ model-00002-of-00002.safetensors
β”‚   β”‚   β”œβ”€β”€ config.json
β”‚   β”‚   β”œβ”€β”€ tokenizer.json
β”‚   β”‚   └── ...
β”‚   └── tensorboard/          # DIFT training logs
└── 4drl/
    β”œβ”€β”€ model-00001-of-00002.safetensors
    β”œβ”€β”€ model-00002-of-00002.safetensors
    β”œβ”€β”€ config.json
    β”œβ”€β”€ tokenizer.json
    β”œβ”€β”€ trainer_state.json
    └── ...

Models

Model Stage Base Model Description
dift/checkpoints/ DIFT Qwen2.5-VL-3B-Instruct Supervised with cosine similarity loss on latent visual tokens
4drl/ 4DRL (GRPO) DIFT checkpoint Reinforced with answer-based rewards

Special Tokens

Three special tokens are added to the Qwen2.5-VL vocabulary:

Token Description
<|latent_pad|> Padding within latent sequences
<|latent_start|> Marks start of latent visual token block
<|latent_end|> Marks end of latent visual token block

Usage

from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor

model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    "./model/4drl",
    torch_dtype="auto",
    device_map="auto"
)
processor = AutoProcessor.from_pretrained("./model/4drl")

License

Apache License 2.0

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support