4DThinker Model Checkpoints

This repository contains the trained model checkpoints from Qwen2.5-VL-3B for 4DThinker, a framework that enables VLMs to "think with 4D" through dynamic latent mental imagery.

Model Structure

model/
├── dift/
│   ├── checkpoints/          # DIFT-stage model weights
│   │   ├── model-00001-of-00002.safetensors
│   │   ├── model-00002-of-00002.safetensors
│   │   ├── config.json
│   │   ├── tokenizer.json
│   │   └── ...
│   └── tensorboard/          # DIFT training logs
└── 4drl/
    ├── model-00001-of-00002.safetensors
    ├── model-00002-of-00002.safetensors
    ├── config.json
    ├── tokenizer.json
    ├── trainer_state.json
    └── ...

Models

Model	Stage	Base Model	Description
`dift/checkpoints/`	DIFT	Qwen2.5-VL-3B-Instruct	Supervised with cosine similarity loss on latent visual tokens
`4drl/`	4DRL (GRPO)	DIFT checkpoint	Reinforced with answer-based rewards

Special Tokens

Three special tokens are added to the Qwen2.5-VL vocabulary:

Token	Description
`<\|latent_pad\|>`	Padding within latent sequences
`<\|latent_start\|>`	Marks start of latent visual token block
`<\|latent_end\|>`	Marks end of latent visual token block

Usage

from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor

model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    "./model/4drl",
    torch_dtype="auto",
    device_map="auto"
)
processor = AutoProcessor.from_pretrained("./model/4drl")

License

Apache License 2.0

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support