zR
commited on
Commit
·
2e5cbb2
1
Parent(s):
4c3068a
readme
Browse files- README.md +25 -8
- README_zh.md +10 -13
README.md
CHANGED
|
@@ -1,12 +1,28 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
# CogVLM2-Llama3-Caption
|
| 2 |
|
| 3 |
<div align="center">
|
| 4 |
<img src=https://raw.githubusercontent.com/THUDM/CogVLM2/cf9cb3c60a871e0c8e5bde7feaf642e3021153e6/resources/logo.svg>
|
| 5 |
</div>
|
| 6 |
|
| 7 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
|
| 9 |
-
## 使用方式
|
| 10 |
```python
|
| 11 |
import io
|
| 12 |
import numpy as np
|
|
@@ -119,12 +135,14 @@ if __name__ == '__main__':
|
|
| 119 |
|
| 120 |
```
|
| 121 |
|
| 122 |
-
##
|
| 123 |
|
| 124 |
-
|
| 125 |
-
[
|
|
|
|
|
|
|
| 126 |
|
| 127 |
-
##
|
| 128 |
|
| 129 |
🌟 If you find our work helpful, please leave us a star and cite our paper.
|
| 130 |
|
|
@@ -134,5 +152,4 @@ if __name__ == '__main__':
|
|
| 134 |
author={Yang, Zhuoyi and Teng, Jiayan and Zheng, Wendi and Ding, Ming and Huang, Shiyu and Xu, Jiazheng and Yang, Yuanming and Hong, Wenyi and Zhang, Xiaohan and Feng, Guanyu and others},
|
| 135 |
journal={arXiv preprint arXiv:2408.06072},
|
| 136 |
year={2024}
|
| 137 |
-
}
|
| 138 |
-
```
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: other
|
| 3 |
+
language:
|
| 4 |
+
- en
|
| 5 |
+
base_model:
|
| 6 |
+
- meta-llama/Meta-Llama-3.1-8B-Instruct
|
| 7 |
+
pipeline_tag: video-text-to-text
|
| 8 |
+
inference: false
|
| 9 |
+
---
|
| 10 |
+
|
| 11 |
+
[中文阅读](README_zh.md)
|
| 12 |
+
|
| 13 |
# CogVLM2-Llama3-Caption
|
| 14 |
|
| 15 |
<div align="center">
|
| 16 |
<img src=https://raw.githubusercontent.com/THUDM/CogVLM2/cf9cb3c60a871e0c8e5bde7feaf642e3021153e6/resources/logo.svg>
|
| 17 |
</div>
|
| 18 |
|
| 19 |
+
# Introduction
|
| 20 |
+
|
| 21 |
+
Typically, most video data does not come with corresponding descriptive text, so it is necessary to convert the video
|
| 22 |
+
data into textual descriptions to provide the essential training data for text-to-video models.
|
| 23 |
+
|
| 24 |
+
## Usage
|
| 25 |
|
|
|
|
| 26 |
```python
|
| 27 |
import io
|
| 28 |
import numpy as np
|
|
|
|
| 135 |
|
| 136 |
```
|
| 137 |
|
| 138 |
+
## License
|
| 139 |
|
| 140 |
+
This model is released under the
|
| 141 |
+
CogVLM2 [LICENSE](https://modelscope.cn/models/ZhipuAI/cogvlm2-video-llama3-base/file/view/master?fileName=LICENSE&status=0).
|
| 142 |
+
For models built with Meta Llama 3, please also adhere to
|
| 143 |
+
the [LLAMA3_LICENSE](https://modelscope.cn/models/ZhipuAI/cogvlm2-video-llama3-base/file/view/master?fileName=LLAMA3_LICENSE&status=0).
|
| 144 |
|
| 145 |
+
## Citation
|
| 146 |
|
| 147 |
🌟 If you find our work helpful, please leave us a star and cite our paper.
|
| 148 |
|
|
|
|
| 152 |
author={Yang, Zhuoyi and Teng, Jiayan and Zheng, Wendi and Ding, Ming and Huang, Shiyu and Xu, Jiazheng and Yang, Yuanming and Hong, Wenyi and Zhang, Xiaohan and Feng, Guanyu and others},
|
| 153 |
journal={arXiv preprint arXiv:2408.06072},
|
| 154 |
year={2024}
|
| 155 |
+
}
|
|
|
README_zh.md
CHANGED
|
@@ -1,16 +1,14 @@
|
|
|
|
|
|
|
|
| 1 |
# CogVLM2-Llama3-Caption
|
| 2 |
|
| 3 |
<div align="center">
|
| 4 |
<img src=https://raw.githubusercontent.com/THUDM/CogVLM2/cf9cb3c60a871e0c8e5bde7feaf642e3021153e6/resources/logo.svg>
|
| 5 |
</div>
|
| 6 |
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
Typically, most video data does not come with corresponding descriptive text, so it is necessary to convert the video
|
| 10 |
-
data into textual descriptions to provide the essential training data for text-to-video models.
|
| 11 |
-
|
| 12 |
-
## Usage
|
| 13 |
|
|
|
|
| 14 |
```python
|
| 15 |
import io
|
| 16 |
import numpy as np
|
|
@@ -123,14 +121,12 @@ if __name__ == '__main__':
|
|
| 123 |
|
| 124 |
```
|
| 125 |
|
| 126 |
-
##
|
| 127 |
|
| 128 |
-
|
| 129 |
-
|
| 130 |
-
For models built with Meta Llama 3, please also adhere to
|
| 131 |
-
the [LLAMA3_LICENSE](https://modelscope.cn/models/ZhipuAI/cogvlm2-video-llama3-base/file/view/master?fileName=LLAMA3_LICENSE&status=0).
|
| 132 |
|
| 133 |
-
##
|
| 134 |
|
| 135 |
🌟 If you find our work helpful, please leave us a star and cite our paper.
|
| 136 |
|
|
@@ -140,4 +136,5 @@ the [LLAMA3_LICENSE](https://modelscope.cn/models/ZhipuAI/cogvlm2-video-llama3-b
|
|
| 140 |
author={Yang, Zhuoyi and Teng, Jiayan and Zheng, Wendi and Ding, Ming and Huang, Shiyu and Xu, Jiazheng and Yang, Yuanming and Hong, Wenyi and Zhang, Xiaohan and Feng, Guanyu and others},
|
| 141 |
journal={arXiv preprint arXiv:2408.06072},
|
| 142 |
year={2024}
|
| 143 |
-
}
|
|
|
|
|
|
| 1 |
+
[Read This in English](README_en.md)
|
| 2 |
+
|
| 3 |
# CogVLM2-Llama3-Caption
|
| 4 |
|
| 5 |
<div align="center">
|
| 6 |
<img src=https://raw.githubusercontent.com/THUDM/CogVLM2/cf9cb3c60a871e0c8e5bde7feaf642e3021153e6/resources/logo.svg>
|
| 7 |
</div>
|
| 8 |
|
| 9 |
+
通常情况下,大部分视频数据并没有附带相应的描述性文本,因此有必要将视频数据转换成文本描述,以提供文本到视频模型所需的必要训练数据。
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
|
| 11 |
+
## 使用方式
|
| 12 |
```python
|
| 13 |
import io
|
| 14 |
import numpy as np
|
|
|
|
| 121 |
|
| 122 |
```
|
| 123 |
|
| 124 |
+
## 模型协议
|
| 125 |
|
| 126 |
+
此模型根据 CogVLM2 [LICENSE](https://modelscope.cn/models/ZhipuAI/cogvlm2-video-llama3-base/file/view/master?fileName=LICENSE&status=0) 发布。对于使用 Meta Llama 3 构建的模型,还请遵守
|
| 127 |
+
[LLAMA3_LICENSE](https://modelscope.cn/models/ZhipuAI/cogvlm2-video-llama3-base/file/view/master?fileName=LLAMA3_LICENSE&status=0)。
|
|
|
|
|
|
|
| 128 |
|
| 129 |
+
## 引用
|
| 130 |
|
| 131 |
🌟 If you find our work helpful, please leave us a star and cite our paper.
|
| 132 |
|
|
|
|
| 136 |
author={Yang, Zhuoyi and Teng, Jiayan and Zheng, Wendi and Ding, Ming and Huang, Shiyu and Xu, Jiazheng and Yang, Yuanming and Hong, Wenyi and Zhang, Xiaohan and Feng, Guanyu and others},
|
| 137 |
journal={arXiv preprint arXiv:2408.06072},
|
| 138 |
year={2024}
|
| 139 |
+
}
|
| 140 |
+
```
|