OpenMOSS-Team
/

moss-video-preview-base

Video-Text-to-Text

text-generation

vision-language

text-generation-inference

Model card Files Files and versions

findcard12138 commited on 28 days ago

Commit

abc5575

·

verified ·

1 Parent(s): 51907cf

Upload README.md with huggingface_hub

Files changed (1) hide show

README.md +9 -5

README.md CHANGED Viewed

@@ -25,11 +25,6 @@ This repo contains the **pretrained weights** that are intended to serve as the
 - **Offline SFT**: instruction-following and reasoning on full video segments
 - **Realtime SFT**: low-latency streaming video understanding and response
-> [!IMPORTANT]
-> ### 🌟 Our Mission & Community Invitation
-> **We have filled the gap in cross-attention-based foundation models for video understanding.**
->
-> We warmly welcome experts in **Representation Learning** and **Model Efficiency** to explore, experiment, and innovate on top of our architecture. Let's push the boundaries of video intelligence and advance the open-source community together!
 #### Model Architecture
@@ -189,6 +184,15 @@ For full environment setup (including optional FlashAttention2 extras), see the
 - This is a **base** model directory. Quality/latency characteristics (offline SFT, real-time streaming, etc.) depend on the specific fine-tuned checkpoints and inference pipeline.
 - The Python source files in this directory are referenced via `auto_map` in `config.json`, so `trust_remote_code=True` is typically required when loading from this local folder.
 ## Citation
 ```bibtex

 - **Offline SFT**: instruction-following and reasoning on full video segments
 - **Realtime SFT**: low-latency streaming video understanding and response
 #### Model Architecture
 - This is a **base** model directory. Quality/latency characteristics (offline SFT, real-time streaming, etc.) depend on the specific fine-tuned checkpoints and inference pipeline.
 - The Python source files in this directory are referenced via `auto_map` in `config.json`, so `trust_remote_code=True` is typically required when loading from this local folder.
+> [!IMPORTANT]
+> ### 🌟 Our Mission & Community Invitation
+> **We have filled the gap in cross-attention-based foundation models for video understanding.**
+>
+> We warmly welcome experts in **Representation Learning** and **Model Efficiency** to explore, experiment, and innovate on top of our architecture. Let's push the boundaries of video intelligence and advance the open-source community together!
 ## Citation
 ```bibtex