Update README.md
Browse files
README.md
CHANGED
|
@@ -41,7 +41,7 @@ Specifically, the pretraining pipeline is structured into the following four pro
|
|
| 41 |
### ✨ Highlights
|
| 42 |
|
| 43 |
- 📐 **Native Dynamic Resolution** MOSS-VL-Base-0408 natively processes images and video frames at their original aspect ratios and resolutions. By preserving the raw spatial layout, it faithfully captures fine visual details across diverse formats—from high-resolution photographs and dense document scans to ultra-wide screenshots.
|
| 44 |
-
- 🎞️ **Native Interleaved Image & Video Inputs** The model accepts arbitrary combinations of images and videos within a single sequence. Through a unified end-to-end pipeline, it seamlessly handles complex mixed-modality prompts, multi-image comparisons, and interleaved visual narratives without requiring modality-specific pre-processing
|
| 45 |
|
| 46 |
|
| 47 |
## 🏗 Model Architecture
|
|
|
|
| 41 |
### ✨ Highlights
|
| 42 |
|
| 43 |
- 📐 **Native Dynamic Resolution** MOSS-VL-Base-0408 natively processes images and video frames at their original aspect ratios and resolutions. By preserving the raw spatial layout, it faithfully captures fine visual details across diverse formats—from high-resolution photographs and dense document scans to ultra-wide screenshots.
|
| 44 |
+
- 🎞️ **Native Interleaved Image & Video Inputs** The model accepts arbitrary combinations of images and videos within a single sequence. Through a unified end-to-end pipeline, it seamlessly handles complex mixed-modality prompts, multi-image comparisons, and interleaved visual narratives without requiring modality-specific pre-processing.
|
| 45 |
|
| 46 |
|
| 47 |
## 🏗 Model Architecture
|