Upload moss-video-preview-base
Browse files
README.md
CHANGED
|
@@ -20,12 +20,12 @@ tags:
|
|
| 20 |
We introduce **MOSS-Video-Preview-Base**, the pretrained foundation checkpoint in the MOSS-Video-Preview series.
|
| 21 |
|
| 22 |
> [!Important]
|
| 23 |
-
> This is a **pretrained** model checkpoint **without** supervised instruction tuning (no offline SFT / no
|
| 24 |
|
| 25 |
This repo contains the **pretrained weights** that are intended to serve as the starting point for downstream:
|
| 26 |
|
| 27 |
- **Offline SFT**: instruction-following and reasoning on full video segments
|
| 28 |
-
- **
|
| 29 |
|
| 30 |
|
| 31 |
|
|
@@ -152,14 +152,14 @@ print(processor.decode(output_ids[0], skip_special_tokens=True))
|
|
| 152 |
|
| 153 |
## ✅ Intended use
|
| 154 |
|
| 155 |
-
- **Foundation checkpoint**: continue pretraining, run domain adaptation, or perform supervised fine-tuning (offline SFT /
|
| 156 |
- **System plumbing validation**: test multimodal IO, temporal position encoding, and long-context behavior.
|
| 157 |
- **If you want instruction-following quality**: use `models/moss-video-sft` or `models/moss-video-realtime-sft` instead of this base checkpoint.
|
| 158 |
|
| 159 |
## ⚠️ Limitations
|
| 160 |
|
| 161 |
- **Not instruction-tuned**: as a pretrain-only checkpoint, responses may be less aligned/helpful than SFT variants.
|
| 162 |
-
- **Realtime streaming not supported by default**: streaming generation APIs are typically provided by
|
| 163 |
- **Performance is hardware/config dependent**: enabling FlashAttention 2 and using `bfloat16` on modern GPUs generally improves throughput and memory efficiency.
|
| 164 |
|
| 165 |
## 🧩 Requirements
|
|
|
|
| 20 |
We introduce **MOSS-Video-Preview-Base**, the pretrained foundation checkpoint in the MOSS-Video-Preview series.
|
| 21 |
|
| 22 |
> [!Important]
|
| 23 |
+
> This is a **pretrained** model checkpoint **without** supervised instruction tuning (no offline SFT / no Real-Time SFT).
|
| 24 |
|
| 25 |
This repo contains the **pretrained weights** that are intended to serve as the starting point for downstream:
|
| 26 |
|
| 27 |
- **Offline SFT**: instruction-following and reasoning on full video segments
|
| 28 |
+
- **Real-Time SFT**: low-latency streaming video understanding and response
|
| 29 |
|
| 30 |
|
| 31 |
|
|
|
|
| 152 |
|
| 153 |
## ✅ Intended use
|
| 154 |
|
| 155 |
+
- **Foundation checkpoint**: continue pretraining, run domain adaptation, or perform supervised fine-tuning (offline SFT / Real-Time SFT).
|
| 156 |
- **System plumbing validation**: test multimodal IO, temporal position encoding, and long-context behavior.
|
| 157 |
- **If you want instruction-following quality**: use `models/moss-video-sft` or `models/moss-video-realtime-sft` instead of this base checkpoint.
|
| 158 |
|
| 159 |
## ⚠️ Limitations
|
| 160 |
|
| 161 |
- **Not instruction-tuned**: as a pretrain-only checkpoint, responses may be less aligned/helpful than SFT variants.
|
| 162 |
+
- **Realtime streaming not supported by default**: streaming generation APIs are typically provided by Real-Time SFT checkpoints.
|
| 163 |
- **Performance is hardware/config dependent**: enabling FlashAttention 2 and using `bfloat16` on modern GPUs generally improves throughput and memory efficiency.
|
| 164 |
|
| 165 |
## 🧩 Requirements
|