Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -290,16 +290,14 @@ texts = [item["text"] for item in result["results"]]
 ## 🚧 Limitations and Future Work
-MOSS-VL-Base-0408 is a pretrained base checkpoint intended primarily as a foundation model, and several release items are still being finalized:
-- realtime usage is not documented here
-- benchmark, metric, and training details are still blank
-- some sections are intentionally placeholders until release information is finalized
-- batch calls currently require shared `generate_kwargs` and shared `media_kwargs` within one call
-- batch streaming and batch cancel / stop protocol are not part of `offline_batch_generate(...)`
 > [!NOTE]
-> We expect future releases to expand public evaluation coverage and provide stronger downstream aligned variants built on top of this base checkpoint.
 ## 📜 Citation
 ```bibtex

 ## 🚧 Limitations and Future Work
+MOSS-VL-Base-0408 is a pretrained base checkpoint, and we are actively improving several core capabilities for future iterations:
+- 📄 **Stronger OCR, Especially for Long Documents** — We plan to further improve text recognition, document parsing, and long-document understanding, with a particular focus on maintaining accuracy and consistency over lengthy structured inputs.
+- 🎬 **Expanded Long-Video Understanding** — We aim to extend the model's ability on long-form video comprehension, including stronger temporal reasoning, better event tracking across extended durations, and more robust long-context video understanding.
+- 🌍 **Richer World Knowledge** — We will continue to enhance the model's general world knowledge so it can provide better grounded multimodal understanding and stronger performance on knowledge-intensive visual-language tasks.
 > [!NOTE]
+> We expect future releases to continue strengthening the base model itself while also enabling stronger downstream aligned variants built on top of it.
 ## 📜 Citation
 ```bibtex