findcard12138 commited on
Commit
a71b33d
·
verified ·
1 Parent(s): c3d29c3

Upload moss-video-preview-base

Browse files
Files changed (1) hide show
  1. README.md +10 -11
README.md CHANGED
@@ -2,7 +2,7 @@
2
  language:
3
  - en
4
  library_name: transformers
5
- pipeline_tag: image-text-to-text
6
  license: apache-2.0
7
  model_type: video_mllama
8
  tags:
@@ -10,10 +10,10 @@ tags:
10
  - video
11
  - vision-language
12
  - mllama
13
- - streaming
14
  ---
15
 
16
- # MOSS-Video-Preview-Base 🤗
17
 
18
  ## Introduction
19
 
@@ -159,13 +159,15 @@ print(processor.decode(output_ids[0], skip_special_tokens=True))
159
  ## ⚠️ Limitations
160
 
161
  - **Not instruction-tuned**: as a pretrain-only checkpoint, responses may be less aligned/helpful than SFT variants.
162
- - **Realtime streaming not supported by default**: streaming generation APIs are typically provided by Real-Time SFT checkpoints.
163
  - **Performance is hardware/config dependent**: enabling FlashAttention 2 and using `bfloat16` on modern GPUs generally improves throughput and memory efficiency.
164
 
165
  ## 🧩 Requirements
166
 
167
  - **Python**: 3.10+
168
  - **PyTorch**: 1.13.1+ (GPU strongly recommended)
 
 
169
  - **Transformers**: required with `trust_remote_code=True` for this model family (due to `auto_map` custom code)
170
  - **Optional (recommended)**: FlashAttention 2 (`attn_implementation="flash_attention_2"`)
171
  - **Video decode**:
@@ -180,7 +182,7 @@ For full environment setup (including optional FlashAttention2 extras), see the
180
  ## ⚠️ Notes
181
 
182
  - This is a **base** model directory. Quality/latency characteristics (offline SFT, real-time streaming, etc.) depend on the specific fine-tuned checkpoints and inference pipeline.
183
- - The Python source files in this directory are referenced via `auto_map` in `config.json`, so `trust_remote_code=True` is typically required when loading from this local folder.
184
 
185
 
186
  > [!IMPORTANT]
@@ -190,17 +192,14 @@ For full environment setup (including optional FlashAttention2 extras), see the
190
  > We warmly welcome experts in **Representation Learning** and **Model Efficiency** to explore, experiment, and innovate on top of our architecture. Let's push the boundaries of video intelligence and advance the open-source community together!
191
 
192
 
193
-
194
  ## Citation
195
-
196
  ```bibtex
197
  @misc{moss_video_2026,
198
- title = {MOSS-Video-Preview: Towards Real-Time Video Understanding},
199
  author = {OpenMOSS Team},
200
  year = {2026},
201
- publisher = {GitHub},
202
- journal = {GitHub repository},
203
- howpublished = {\url{https://github.com/fnlp-vision/MOSS-Video-Preview}}
204
  }
205
  ```
206
 
 
2
  language:
3
  - en
4
  library_name: transformers
5
+ pipeline_tag: video-text-to-text
6
  license: apache-2.0
7
  model_type: video_mllama
8
  tags:
 
10
  - video
11
  - vision-language
12
  - mllama
13
+ - video-text-to-text
14
  ---
15
 
16
+ # MOSS-Video-Preview-Base
17
 
18
  ## Introduction
19
 
 
159
  ## ⚠️ Limitations
160
 
161
  - **Not instruction-tuned**: as a pretrain-only checkpoint, responses may be less aligned/helpful than SFT variants.
162
+ - **Real-Time streaming not supported by default**: streaming generation APIs are typically provided by Real-Time SFT checkpoints.
163
  - **Performance is hardware/config dependent**: enabling FlashAttention 2 and using `bfloat16` on modern GPUs generally improves throughput and memory efficiency.
164
 
165
  ## 🧩 Requirements
166
 
167
  - **Python**: 3.10+
168
  - **PyTorch**: 1.13.1+ (GPU strongly recommended)
169
+ - **Tested setup**: Python 3.12.4 + PyTorch 2.4.0 (CUDA 12.1) + DeepSpeed 0.16.1
170
+ - **CPU-only**: PyTorch 2.4.0
171
  - **Transformers**: required with `trust_remote_code=True` for this model family (due to `auto_map` custom code)
172
  - **Optional (recommended)**: FlashAttention 2 (`attn_implementation="flash_attention_2"`)
173
  - **Video decode**:
 
182
  ## ⚠️ Notes
183
 
184
  - This is a **base** model directory. Quality/latency characteristics (offline SFT, real-time streaming, etc.) depend on the specific fine-tuned checkpoints and inference pipeline.
185
+ - The Python source files in this directory are referenced via `auto_map` in `config.json`.
186
 
187
 
188
  > [!IMPORTANT]
 
192
  > We warmly welcome experts in **Representation Learning** and **Model Efficiency** to explore, experiment, and innovate on top of our architecture. Let's push the boundaries of video intelligence and advance the open-source community together!
193
 
194
 
 
195
  ## Citation
 
196
  ```bibtex
197
  @misc{moss_video_2026,
198
+ title = {{MOSS-Video-Preview: Next-Generation Real-Time Video Understanding}},
199
  author = {OpenMOSS Team},
200
  year = {2026},
201
+ howpublished = {\url{https://github.com/fnlp-vision/MOSS-Video-Preview}},
202
+ note = {GitHub repository}
 
203
  }
204
  ```
205