GrayShine commited on
Commit
601c3f8
·
verified ·
1 Parent(s): 6b48f37

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -1
README.md CHANGED
@@ -39,7 +39,10 @@ Video-GPT is a video self-supervised generative pre-trained model which treats v
39
 
40
  Previous works on visual generation relies heavily on supervisory signals from textual modalities (such as Sora, WanX, HunyuanVideo, MovieGen). However, vision, as a natural ability of human beings, was formed even earlier than language. Therefore, we believe that the information of the visual modality itself is sufficient to support the model to model the world.
41
 
42
- ![demo](https://github.com/zhuangshaobin/Video-GPT/tree/main/imgs/teaser.png)
 
 
 
43
 
44
  In addition, compared with the previous model architecture with many special designs for diffusion model (e.g., UNet, DiT, MM-DiT), we adopted the simplest vanilla transformer architecture. On the one hand, it is more conducive to the exploration of scaling law in the future. On the other hand, it is also more convenient for the community to follow up.
45
 
 
39
 
40
  Previous works on visual generation relies heavily on supervisory signals from textual modalities (such as Sora, WanX, HunyuanVideo, MovieGen). However, vision, as a natural ability of human beings, was formed even earlier than language. Therefore, we believe that the information of the visual modality itself is sufficient to support the model to model the world.
41
 
42
+ <!-- ![demo](https://github.com/zhuangshaobin/Video-GPT/tree/main/imgs/teaser.png) -->
43
+ <p align="left">
44
+ <img src="https://github.com/zhuangshaobin/Video-GPT/tree/main/imgs/teaser.png" alt="demo" width="640"/>
45
+ </p>
46
 
47
  In addition, compared with the previous model architecture with many special designs for diffusion model (e.g., UNet, DiT, MM-DiT), we adopted the simplest vanilla transformer architecture. On the one hand, it is more conducive to the exploration of scaling law in the future. On the other hand, it is also more convenient for the community to follow up.
48