[ICLR 2026] Map the Flow: Revealing Hidden Pathways of Information in VideoLLMs

teaser
TL;DR: This paper presents a systematic analysis of where and how information flows in VideoLLMs for temporal reasoning in VideoQA, revealing key patterns and effective pathways.
If you like our project, please give us a star ⭐ on Github for the latest update.

Introduction

This is LLaVA-NeXT-13B-Video-FT, a video-language model fine-tuned for our ICLR 2026 paper Map the Flow: Revealing Hidden Pathways of Information in VideoLLMs.

We fine-tuned llava-hf/llava-v1.6-vicuna-13b-hf on the video portion of VideoChat2-IT (our cleaned annotations: VideoChat2-IT-clean) for 1epoch to study how video instruction tuning shapes information flow in VideoLLMs. This model is used to analyze temporal reasoning patterns via causal intervention tools such as Attention Knockout and Logit Lens.

Model Zoo

Results

We identify effective information pathways in VideoLLMs and show that these sparse pathways are sufficient for solving VideoQA tasks. With only 37% of attention edges in LLaVA-NeXT-13B-Video-FT composing these effective pathways, the model retains its VideoQA performance.

main results

Citation

If you find our paper useful in your research, please consider citing:

@inproceedings{kim2026map,
  author    = {Kim, Minji and Kim, Taekyung and Han, Bohyung},
  title     = {Map the Flow: Revealing Hidden Pathways of Information in VideoLLMs},
  booktitle = {International Conference on Learning Representations (ICLR)},
  year      = {2026},
}

@article{kim2025map,
  author    = {Kim, Minji and Kim, Taekyung and Han, Bohyung},
  title     = {Map the Flow: Revealing Hidden Pathways of Information in VideoLLMs},
  journal   = {arXiv preprint arXiv:2510.13251},
  year      = {2025},
}
Downloads last month
8
Safetensors
Model size
13B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for byminji/LLaVA-NeXT-13B-Video-FT

Finetuned
(3)
this model

Datasets used to train byminji/LLaVA-NeXT-13B-Video-FT

Collection including byminji/LLaVA-NeXT-13B-Video-FT

Paper for byminji/LLaVA-NeXT-13B-Video-FT