Emma02
/

LVM_ckpts

@@ -1,9 +1,18 @@
 # LVM
 This is the model implementation of the CVPR 2024 'Sequential Modeling Enables Scalable Learning for Large Vision Models'. (https://arxiv.org/abs/2312.00785)
 LVM is a vision pretraining model that converts various kinds of visual data into visual sentences and performs next-token prediction autoregressively. It is compatible with both GPU and TPU.
 LVM is built on top of [OpenLLaMA](https://github.com/openlm-research/open_llama) (an autoregressive model) and [OpenMuse](https://github.com/huggingface/open-muse) (a VQGAN that converts images into visual tokens).
 This was trained in collaboration with HuggingFace. Thanks [Victor Sanh](https://huggingface.co/VictorSanh) for the support in this project.
@@ -27,4 +36,4 @@ If you found LVM useful in your research or applications, please cite our work u
   journal={arXiv preprint arXiv:2312.00785},
   year={2023}
 }
-\`\`\`

+---
+license: apache-2.0
+tags:
+- image
+- video
+inference: false
+---
 # LVM
 This is the model implementation of the CVPR 2024 'Sequential Modeling Enables Scalable Learning for Large Vision Models'. (https://arxiv.org/abs/2312.00785)
 LVM is a vision pretraining model that converts various kinds of visual data into visual sentences and performs next-token prediction autoregressively. It is compatible with both GPU and TPU.
+You can try out the demo [here](https://huggingface.co/spaces/Emma02/LVM).
 LVM is built on top of [OpenLLaMA](https://github.com/openlm-research/open_llama) (an autoregressive model) and [OpenMuse](https://github.com/huggingface/open-muse) (a VQGAN that converts images into visual tokens).
 This was trained in collaboration with HuggingFace. Thanks [Victor Sanh](https://huggingface.co/VictorSanh) for the support in this project.
   journal={arXiv preprint arXiv:2312.00785},
   year={2023}
 }
+\`\`\`