| base_model: | |
| - Qwen/Qwen2-0.5B-Instruct | |
| - google/siglip-so400m-patch14-384 | |
| datasets: | |
| - liuhaotian/LLaVA-Pretrain | |
| - lmms-lab/LLaVA-ReCap-558K | |
| - lmms-lab/LLaVA-ReCap-118K | |
| - lmms-lab/LLaVA-ReCap-CC3M | |
| - lmms-lab/LLaVA-OneVision-Mid-Data | |
| - lmms-lab/LLaVA-OneVision-Data | |
| - Zhiqiang007/MathV360K | |
| language: | |
| - en | |
| license: mit | |
| pipeline_tag: image-text-to-text | |
| library_name: transformers | |
| tags: | |
| - LLaVA-OneVision-Manager | |
| - LLaVA-OV-Manager | |
| - Manager | |
| Model weights for our submission to TCSVT, titled "Manager: Aggregating Insights from Unimodal Experts in Two-Tower VLMs and MLLMs". | |
| Related materials can be found at [Paper](https://huggingface.co/papers/2506.11515), [Code](https://github.com/LooperXX/LLaVA-OV-Manager), https://looperxx.github.io/. |