THUdyh
/

Oryx-ViT

@@ -1,21 +1,22 @@
 ---
-license: apache-2.0
 base_model:
 - google/siglip-so400m-patch14-384
-pipeline_tag: image-classification
 language:
 - en
 - zh
 ---
 # Oryx-ViT
 ## Model Summary
-The Oryx-ViT model is trained on 200M data and can seamlessly and efficiently process visual inputs with arbitrary spatial sizes and temporal lengths.
 - **Repository:** https://github.com/Oryx-mllm/Oryx
 - **Languages:** English, Chinese
-- **Paper:** https://arxiv.org/abs/2409.12961
 ### Model Architecture
@@ -30,4 +31,13 @@ The Oryx-ViT model is trained on 200M data and can seamlessly and efficiently pr
 - **Orchestration:** HuggingFace Trainer
 - **Code:** Pytorch
-## Citation

 ---
 base_model:
 - google/siglip-so400m-patch14-384
 language:
 - en
 - zh
+license: apache-2.0
+pipeline_tag: image-feature-extraction
 ---
 # Oryx-ViT
 ## Model Summary
+The Oryx-ViT model is trained on 200M data and can seamlessly and efficiently process visual inputs with arbitrary spatial sizes and temporal lengths.  It is described in the paper [Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution](https://arxiv.org/abs/2409.12961).
 - **Repository:** https://github.com/Oryx-mllm/Oryx
+- **Project Page:** https://oryx-mllm.github.io
 - **Languages:** English, Chinese
 ### Model Architecture
 - **Orchestration:** HuggingFace Trainer
 - **Code:** Pytorch
+## Citation
+```bibtex
+@article{liu2024oryx,
+title={Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution},
+author={Liu, Zuyan and Dong, Yuhao and Liu, Ziwei and Hu, Winston and Lu, Jiwen and Rao, Yongming},
+journal={arXiv preprint arXiv:2409.12961},
+year={2024}
+}
+```