rp-yu
/

Dimple-7B

Image-Text-to-Text

feature-extraction

Diffusion_Multimodal_Large_Language_Model

Discrete_Diffusion

Model card Files Files and versions

rp-yu commited on May 19, 2025

Commit

5264308

·

verified ·

1 Parent(s): 471de27

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -21,7 +21,7 @@ pipeline_tag: image-text-to-text
 # Dimple-7B 💧
-**Dimple** is the first Discrete Diffusion Multimodal Large Language Model (DMLLM) that leverages a hybrid training paradigm combining autoregressive and diffusion-based instruction tuning. The model architecture is similar to Qwen and LLaVA, while introducing a novel **autoregressive-then-diffusion** training strategy:
 * **Stage 1**: Autoregressive fine-tuning for alignment and initial instruction tuning.
 * **Stage 2**: Diffusion-based fine-tuning for enhanced instruction-following capabilities.

 # Dimple-7B 💧
+**Dimple** is the first Discrete Diffusion Multimodal Large Language Model (DMLLM) that leverages a hybrid training paradigm combining autoregressive and diffusion-based instruction tuning. The model architecture is similar to Qwen and LLaVA, while introducing an **autoregressive-then-diffusion** training strategy:
 * **Stage 1**: Autoregressive fine-tuning for alignment and initial instruction tuning.
 * **Stage 2**: Diffusion-based fine-tuning for enhanced instruction-following capabilities.