Video-Text-to-Text
Transformers
Safetensors
English
qwen3_vl
image-text-to-text
video
long-video
reasoning
tool-calling
agentic-rl
grpo
multimodal
Instructions to use ParaVT/ParaVT-8B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ParaVT/ParaVT-8B with Transformers:
# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("ParaVT/ParaVT-8B") model = AutoModelForImageTextToText.from_pretrained("ParaVT/ParaVT-8B") - Notebooks
- Google Colab
- Kaggle
mwxely commited on
Commit ·
d1d66d5
1
Parent(s): e399087
docs: update BibTeX to @article with arXiv journal and Last, First author format
Browse files
README.md
CHANGED
|
@@ -72,13 +72,11 @@ For inference outside the eval driver, treat the model exactly like `Qwen/Qwen3-
|
|
| 72 |
If you find ParaVT useful for your research and applications, please cite:
|
| 73 |
|
| 74 |
```bibtex
|
| 75 |
-
@
|
| 76 |
-
title={
|
| 77 |
-
author={
|
| 78 |
-
|
| 79 |
-
|
| 80 |
-
archivePrefix={arXiv},
|
| 81 |
-
primaryClass={cs.CV}
|
| 82 |
}
|
| 83 |
```
|
| 84 |
|
|
|
|
| 72 |
If you find ParaVT useful for your research and applications, please cite:
|
| 73 |
|
| 74 |
```bibtex
|
| 75 |
+
@article{yang2026paravt,
|
| 76 |
+
title={ParaVT: Taming the Tool Prior Paradox for Parallel Tool Use in Agentic Video Reinforcement Learning},
|
| 77 |
+
author={Yang, Zuhao and Zhang, Kaichen and Wang, Sudong and Wu, Keming and Yang, Zhongyu and Li, Bo and Qi, Xiaojuan and Lu, Shijian and Li, Xingxuan and Bing, Lidong},
|
| 78 |
+
journal={arXiv preprint arXiv:2605.20342},
|
| 79 |
+
year={2026}
|
|
|
|
|
|
|
| 80 |
}
|
| 81 |
```
|
| 82 |
|