findcard12138 commited on
Commit
e29cd8d
ยท
verified ยท
1 Parent(s): 03be5cf

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +1 -3
README.md CHANGED
@@ -296,9 +296,7 @@ MOSS-VL-Instruct-0408 represents an early milestone in the MOSS-VL roadmap, and
296
  - ๐Ÿงฎ **Math & Code Reasoning** โ€” While the current checkpoint already exhibits solid general reasoning, we plan to substantially strengthen its mathematical reasoning and code reasoning capabilities, especially in multimodal contexts.
297
  - โšก **Real-Time Streaming Variant** โ€” The upcoming **MOSS-VL-RealTime** will extend MOSS-VL to low-latency, streaming video understanding, enabling interactive applications such as live video chat, real-time event detection, and online assistants.
298
  - ๐ŸŽฏ **RL Post-Training** โ€” We are working on a reinforcement learning post-training stage to further align the model with human preferences and to unlock stronger multi-step reasoning behaviors on top of the SFT foundation.
299
- - โณ **Longer Context for Hour-Scale Video** โ€” Continuing to push context scaling so the model can comfortably handle hour-scale and multi-hour videos with consistent temporal grounding.
300
- - ๐Ÿ”Š **Audio Modality Integration** โ€” Bringing audio understanding into the pipeline, so MOSS-VL can jointly reason over the visual and acoustic streams of a video โ€” speech, ambient sound, music, and their interaction with on-screen events.
301
- - ๐Ÿ“ **Parameter Scaling** โ€” Releasing additional model sizes across the MOSS-VL series to cover a wider range of compute budgets and deployment scenarios.
302
 
303
  > [!NOTE]
304
  > We welcome community feedback and contributions on any of these directions.
 
296
  - ๐Ÿงฎ **Math & Code Reasoning** โ€” While the current checkpoint already exhibits solid general reasoning, we plan to substantially strengthen its mathematical reasoning and code reasoning capabilities, especially in multimodal contexts.
297
  - โšก **Real-Time Streaming Variant** โ€” The upcoming **MOSS-VL-RealTime** will extend MOSS-VL to low-latency, streaming video understanding, enabling interactive applications such as live video chat, real-time event detection, and online assistants.
298
  - ๐ŸŽฏ **RL Post-Training** โ€” We are working on a reinforcement learning post-training stage to further align the model with human preferences and to unlock stronger multi-step reasoning behaviors on top of the SFT foundation.
299
+
 
 
300
 
301
  > [!NOTE]
302
  > We welcome community feedback and contributions on any of these directions.