4DThinker Model Checkpoints
This repository contains the trained model checkpoints from Qwen2.5-VL-3B for 4DThinker, a framework that enables VLMs to "think with 4D" through dynamic latent mental imagery.
Model Structure
model/
βββ dift/
β βββ checkpoints/ # DIFT-stage model weights
β β βββ model-00001-of-00002.safetensors
β β βββ model-00002-of-00002.safetensors
β β βββ config.json
β β βββ tokenizer.json
β β βββ ...
β βββ tensorboard/ # DIFT training logs
βββ 4drl/
βββ model-00001-of-00002.safetensors
βββ model-00002-of-00002.safetensors
βββ config.json
βββ tokenizer.json
βββ trainer_state.json
βββ ...
Models
| Model | Stage | Base Model | Description |
|---|---|---|---|
dift/checkpoints/ |
DIFT | Qwen2.5-VL-3B-Instruct | Supervised with cosine similarity loss on latent visual tokens |
4drl/ |
4DRL (GRPO) | DIFT checkpoint | Reinforced with answer-based rewards |
Special Tokens
Three special tokens are added to the Qwen2.5-VL vocabulary:
| Token | Description |
|---|---|
<|latent_pad|> |
Padding within latent sequences |
<|latent_start|> |
Marks start of latent visual token block |
<|latent_end|> |
Marks end of latent visual token block |
Usage
from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
"./model/4drl",
torch_dtype="auto",
device_map="auto"
)
processor = AutoProcessor.from_pretrained("./model/4drl")
License
Apache License 2.0
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support