feat: update README.md, add update structure image

Browse files

Files changed (3) hide show

.gitattributes +2 -0
README.md +10 -12
valley_structure.jpeg → valley_structure.png +2 -2

.gitattributes CHANGED Viewed

@@ -34,3 +34,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
 valley_structure.jpeg filter=lfs diff=lfs merge=lfs -text

 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
 valley_structure.jpeg filter=lfs diff=lfs merge=lfs -text
+valley_structure.png filter=lfs diff=lfs merge=lfs -text
+*.png filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -3,7 +3,7 @@ license: apache-2.0
 base_model:
 - Qwen/Qwen2.5-7B-Instruct
 ---
-# Valley 2.0
 <p align="center">
     <img src="https://raw.githubusercontent.com/bytedance/Valley/refs/heads/main/assets/valley_logo.jpg" width="500"/>
@@ -17,25 +17,23 @@ base_model:
 Valley is a cutting-edge multimodal large model designed to handle a variety of tasks involving text, images, and video data, which is developed by ByteDance. Our model not only
 - Achieved the best results in the inhouse e-commerce and short-video benchmarks
-- Demonstrated comparatively outstanding performance in the OpenCompass (average scores > 67) tests
-when evaluated against models of the same scale.
 ## Release
-- [02/15] 🔥 Update Valley-Eagle-DPO, achieve 69.6 on OpenCompass and update AutoModel usage for checkpoints.
-- [01/13] 🔥 Release TechReport. [Valley2: Exploring Multimodal Models with Scalable Vision-Language Design](https://arxiv.org/abs/2501.05901)
-- [12/23] Announcing [Valley-Qwen2.5-7B](https://huggingface.co/ByteDance)!
-## Valley-Eagle
-The foundational version of Valley is a multimodal large model aligned with Siglip and Qwen2.5, incorporating LargeMLP and ConvAdapter to construct the projector.
 - In the final version, we also referenced Eagle, introducing an additional VisionEncoder that can flexibly adjust the number of tokens and is parallelized with the original visual tokens.
 - This enhancement supplements the model’s performance in extreme scenarios, and we chose the Qwen2vl VisionEncoder for this purpose.
-and the model structure is shown as follows:
-<div style="display:flex;">
-  <img src="valley_structure.jpeg" alt="opencompass" style="height:600px;" />
 </div>

 base_model:
 - Qwen/Qwen2.5-7B-Instruct
 ---
+# Valley2
 <p align="center">
     <img src="https://raw.githubusercontent.com/bytedance/Valley/refs/heads/main/assets/valley_logo.jpg" width="500"/>
 Valley is a cutting-edge multimodal large model designed to handle a variety of tasks involving text, images, and video data, which is developed by ByteDance. Our model not only
 - Achieved the best results in the inhouse e-commerce and short-video benchmarks
+- Demonstrated comparatively outstanding performance in the OpenCompass leaderboard when evaluated against models of the same scale.
 ## Release
+- [2025/02/15] 🔥 Update Valley2-DPO, achieve 69.6 on OpenCompass and update AutoModel usage for checkpoints.
+- [2025/01/13] 🔥 Release TechReport. [Valley2: Exploring Multimodal Models with Scalable Vision-Language Design](https://arxiv.org/abs/2501.05901)
+- [2024/12/23] 🔥 Announcing [Valley2](https://huggingface.co/ByteDance) (Valley-Eagle-7B)  !
+## Architecture
+The foundational version of Valley2 is a multimodal large model aligned with Siglip and Qwen2.5, incorporating LargeMLP and ConvAdapter to construct the projector.
 - In the final version, we also referenced Eagle, introducing an additional VisionEncoder that can flexibly adjust the number of tokens and is parallelized with the original visual tokens.
 - This enhancement supplements the model’s performance in extreme scenarios, and we chose the Qwen2vl VisionEncoder for this purpose.
+The model structure is shown as follows:
+<div style="display: flex;">
+  <img src="valley_structure.png" alt="opencompass" style="width: 100%; height: auto;" />
 </div>

valley_structure.jpeg → valley_structure.png RENAMED Viewed

File without changes