OpenMOSS-Team
/

moss-video-preview-base

Video-Text-to-Text

text-generation

vision-language

text-generation-inference

Model card Files Files and versions

findcard12138 commited on about 1 month ago

Commit

eeebae6

·

verified ·

1 Parent(s): 32d2053

Upload moss-video-preview-base

Files changed (1) hide show

README.md +4 -4

README.md CHANGED Viewed

@@ -20,12 +20,12 @@ tags:
 We introduce **MOSS-Video-Preview-Base**, the pretrained foundation checkpoint in the MOSS-Video-Preview series.
 > [!Important]
-> This is a **pretrained** model checkpoint **without** supervised instruction tuning (no offline SFT / no realtime SFT).
 This repo contains the **pretrained weights** that are intended to serve as the starting point for downstream:
 - **Offline SFT**: instruction-following and reasoning on full video segments
-- **Realtime SFT**: low-latency streaming video understanding and response
@@ -152,14 +152,14 @@ print(processor.decode(output_ids[0], skip_special_tokens=True))
 ## ✅ Intended use
-- **Foundation checkpoint**: continue pretraining, run domain adaptation, or perform supervised fine-tuning (offline SFT / realtime SFT).
 - **System plumbing validation**: test multimodal IO, temporal position encoding, and long-context behavior.
 - **If you want instruction-following quality**: use `models/moss-video-sft` or `models/moss-video-realtime-sft` instead of this base checkpoint.
 ## ⚠️ Limitations
 - **Not instruction-tuned**: as a pretrain-only checkpoint, responses may be less aligned/helpful than SFT variants.
-- **Realtime streaming not supported by default**: streaming generation APIs are typically provided by realtime-SFT checkpoints.
 - **Performance is hardware/config dependent**: enabling FlashAttention 2 and using `bfloat16` on modern GPUs generally improves throughput and memory efficiency.
 ## 🧩 Requirements

 We introduce **MOSS-Video-Preview-Base**, the pretrained foundation checkpoint in the MOSS-Video-Preview series.
 > [!Important]
+> This is a **pretrained** model checkpoint **without** supervised instruction tuning (no offline SFT / no Real-Time SFT).
 This repo contains the **pretrained weights** that are intended to serve as the starting point for downstream:
 - **Offline SFT**: instruction-following and reasoning on full video segments
+- **Real-Time SFT**: low-latency streaming video understanding and response
 ## ✅ Intended use
+- **Foundation checkpoint**: continue pretraining, run domain adaptation, or perform supervised fine-tuning (offline SFT / Real-Time SFT).
 - **System plumbing validation**: test multimodal IO, temporal position encoding, and long-context behavior.
 - **If you want instruction-following quality**: use `models/moss-video-sft` or `models/moss-video-realtime-sft` instead of this base checkpoint.
 ## ⚠️ Limitations
 - **Not instruction-tuned**: as a pretrain-only checkpoint, responses may be less aligned/helpful than SFT variants.
+- **Realtime streaming not supported by default**: streaming generation APIs are typically provided by Real-Time SFT checkpoints.
 - **Performance is hardware/config dependent**: enabling FlashAttention 2 and using `bfloat16` on modern GPUs generally improves throughput and memory efficiency.
 ## 🧩 Requirements