ReDiff / README.md
nielsr's picture
nielsr HF Staff
Add comprehensive model card for ReDiff with metadata and usage
b8e0e1c verified
|
raw
history blame
2.65 kB
metadata
pipeline_tag: image-text-to-text
library_name: transformers

From Denoising to Refining: A Corrective Framework for Vision-Language Diffusion Model

This repository contains the official implementation of ReDiff, a refining-enhanced vision-language diffusion model, as presented in the paper From Denoising to Refining: A Corrective Framework for Vision-Language Diffusion Model.

ReDiff addresses the train-inference discrepancy in discrete diffusion models, which often leads to catastrophic error cascades. It reframes the generation process from passive denoising to active refining, teaching the model to identify and correct its own errors. This innovative approach involves a two-stage training process: first, instilling foundational revision capabilities by training the model to revise synthetic errors, and second, implementing a novel online self-correction loop where the model learns to refine its own flawed drafts from an expert's corrections. This mistake-driven learning significantly improves the coherence and factual accuracy of generated content, enabling stable and efficient parallel generation far superior to traditional denoising methods.

ReDiff Teaser

Quick Inference Demo

The ReDiff model is designed for vision-language tasks. To quickly test the model with a visual instruction demo, follow these simple steps:

  1. Clone the repository
    git clone https://github.com/jiyt17/ReDiff
    cd ReDiff/train
    
  2. Initialize the environment Run the environment setup script to install necessary dependencies (this includes transformers):
    bash init_env.sh
    
  3. Run the demo script Execute the demo script to test ReDiff on an example image:
    python generate_demo.py
    

For more detailed usage, training, and evaluation instructions, please refer to the GitHub repository.

Citation

If you find our work helpful or inspiring, please feel free to cite it.

@article{ji2025denoising,
  title={From Denoising to Refining: A Corrective Framework for Vision-Language Diffusion Model},
  author={Ji, Yatai and Wang, Teng and Ge, Yuying and Liu, Zhiheng and Yang, Sidi and Shan, Ying and Luo, Ping},
  journal={arXiv preprint arXiv:2510.19871},
  year={2025}
}