arxiv:2508.21094

EMCompress: Video-LLMs with Endomorphic Multimodal Compression

Published on Apr 24

Authors:

Abstract

A new cognitively-inspired approach called Endomorphic Multimodal Compression addresses long-video reasoning challenges by compressing multimodal inputs while preserving answer-invariant information, improving video-language understanding efficiency.

AI-generated summary

Video-LLMs face a fundamental tension in long-video reasoning: static, sparse frame sampling either dilutes evidence across task-irrelevant segments at significant cost or misses fine-grained temporal semantics altogether. We propose a novel, cognitively-inspired task -- Endomorphic Multimodal Compression (EMC) -- as a structurally-constrained sufficient-statistic problem for VideoQA, and formulate it as an endomorphic transformation F_EMC : (V, Q) -> (v, q) that compresses the multimodal input while preserving answer invariance across reasonable downstream models. The endomorphic form keeps the compressed output in the downstream pipeline's native task space -- a structural mirror of the filter-then-reason mechanism in the cognitive literature motivating EMC -- distinguishing it from latent-code compression (IB / VIB) and making the formulation extensible to other multimodal settings. Under the Markov chain A -> (V, Q) -> (v, q), EMC realizes the classical sufficiency condition I((v, q); A) = I((V, Q); A) in its VideoQA-natural form. As a modular front-end, EMC plugs into both Video Instruction Tuning and Video Question Answering pipelines. We release the first dedicated benchmark and propose ReSimplifyIt, an EMC baseline surpassing prior methods by 0.40 F-1 with competitive query rewriting. Integrating EMC yields relative gains of 7.33% in training and 33.7% in inference for video-language understanding.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2508.21094

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2508.21094 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2508.21094 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2508.21094 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.