OpenMOSS-Team
/

moss-video-preview-base

Video-Text-to-Text

text-generation

vision-language

text-generation-inference

Model card Files Files and versions

findcard12138 commited on Mar 18

Commit

32d2053

·

verified ·

1 Parent(s): ad70ed1

Upload README.md with huggingface_hub

Files changed (1) hide show

README.md +9 -15

README.md CHANGED Viewed

@@ -44,19 +44,12 @@ For architecture diagrams and full system details, see the top-level repository:
 ## 🚀 Quickstart
-### Offline video inference (works with base/SFT checkpoints)
-Use this to sanity-check **loading**, **video ingestion**, and **end-to-end generation**.
-#### Video inference (Python, recommended)
 ```python
 import torch
 from transformers import AutoModelForCausalLM, AutoProcessor
-# Use local path like: "models/moss-video-preview-base"
-# Or use Hugging Face model id like: "fnlp-vision/moss-video-preview-base"
 checkpoint = "fnlp-vision/moss-video-preview-base"
 video_path = "data/example_video.mp4"
 prompt = "" # For base model, prompt is set to empty to perform completion task.
@@ -99,20 +92,20 @@ with torch.no_grad():
     output_ids = model.generate(**inputs, max_new_tokens=512, do_sample=False)
 print(processor.decode(output_ids[0], skip_special_tokens=True))
-# Tip: set skip_special_tokens=False only when debugging special tokens / chat template formatting.
 ```
-#### Image inference (Python)
 ```python
 import torch
 from PIL import Image
 from transformers import AutoModelForCausalLM, AutoProcessor
-# Use local path like: "models/moss-video-preview-base"
-# Or use Hugging Face model id like: "fnlp-vision/moss-video-preview-base"
 checkpoint = "fnlp-vision/moss-video-preview-base"
 image_path = "data/example_image.jpg"
 prompt = "" # For base model, prompt is set to empty to perform completion task.
@@ -153,9 +146,10 @@ with torch.no_grad():
     output_ids = model.generate(**inputs, max_new_tokens=256, do_sample=False)
 print(processor.decode(output_ids[0], skip_special_tokens=True))
-# Tip: set skip_special_tokens=False only when debugging special tokens / chat template formatting.
 ```
 ## ✅ Intended use
 - **Foundation checkpoint**: continue pretraining, run domain adaptation, or perform supervised fine-tuning (offline SFT / realtime SFT).

 ## 🚀 Quickstart
+<details>
+<summary><strong>Video inference</strong></summary>
 ```python
 import torch
 from transformers import AutoModelForCausalLM, AutoProcessor
 checkpoint = "fnlp-vision/moss-video-preview-base"
 video_path = "data/example_video.mp4"
 prompt = "" # For base model, prompt is set to empty to perform completion task.
     output_ids = model.generate(**inputs, max_new_tokens=512, do_sample=False)
 print(processor.decode(output_ids[0], skip_special_tokens=True))
 ```
+ </details>
+<details>
+<summary><strong>Image inference</strong></summary>
 ```python
 import torch
 from PIL import Image
 from transformers import AutoModelForCausalLM, AutoProcessor
 checkpoint = "fnlp-vision/moss-video-preview-base"
 image_path = "data/example_image.jpg"
 prompt = "" # For base model, prompt is set to empty to perform completion task.
     output_ids = model.generate(**inputs, max_new_tokens=256, do_sample=False)
 print(processor.decode(output_ids[0], skip_special_tokens=True))
 ```
+</details>
 ## ✅ Intended use
 - **Foundation checkpoint**: continue pretraining, run domain adaptation, or perform supervised fine-tuning (offline SFT / realtime SFT).