Update README.md
Browse files
README.md
CHANGED
|
@@ -290,16 +290,14 @@ texts = [item["text"] for item in result["results"]]
|
|
| 290 |
|
| 291 |
## π§ Limitations and Future Work
|
| 292 |
|
| 293 |
-
MOSS-VL-Base-0408 is a pretrained base checkpoint
|
| 294 |
|
| 295 |
-
-
|
| 296 |
-
-
|
| 297 |
-
-
|
| 298 |
-
- batch calls currently require shared `generate_kwargs` and shared `media_kwargs` within one call
|
| 299 |
-
- batch streaming and batch cancel / stop protocol are not part of `offline_batch_generate(...)`
|
| 300 |
|
| 301 |
> [!NOTE]
|
| 302 |
-
> We expect future releases to
|
| 303 |
|
| 304 |
## π Citation
|
| 305 |
```bibtex
|
|
|
|
| 290 |
|
| 291 |
## π§ Limitations and Future Work
|
| 292 |
|
| 293 |
+
MOSS-VL-Base-0408 is a pretrained base checkpoint, and we are actively improving several core capabilities for future iterations:
|
| 294 |
|
| 295 |
+
- π **Stronger OCR, Especially for Long Documents** β We plan to further improve text recognition, document parsing, and long-document understanding, with a particular focus on maintaining accuracy and consistency over lengthy structured inputs.
|
| 296 |
+
- π¬ **Expanded Long-Video Understanding** β We aim to extend the model's ability on long-form video comprehension, including stronger temporal reasoning, better event tracking across extended durations, and more robust long-context video understanding.
|
| 297 |
+
- π **Richer World Knowledge** β We will continue to enhance the model's general world knowledge so it can provide better grounded multimodal understanding and stronger performance on knowledge-intensive visual-language tasks.
|
|
|
|
|
|
|
| 298 |
|
| 299 |
> [!NOTE]
|
| 300 |
+
> We expect future releases to continue strengthening the base model itself while also enabling stronger downstream aligned variants built on top of it.
|
| 301 |
|
| 302 |
## π Citation
|
| 303 |
```bibtex
|