Update README.md
Browse files
README.md
CHANGED
|
@@ -1,10 +1,22 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
# MoTIF — Concepts in Motion
|
| 2 |
|
| 3 |
[**Read the Paper (arXiv)**](https://arxiv.org/pdf/2509.20899)
|
| 4 |
|
|
|
|
|
|
|
| 5 |
## Abstract
|
| 6 |
|
| 7 |
-
|
| 8 |
|
| 9 |
---
|
| 10 |
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
language:
|
| 4 |
+
- en
|
| 5 |
+
pipeline_tag: video-classification
|
| 6 |
+
tags:
|
| 7 |
+
- video-classification
|
| 8 |
+
- concept-bottleneck-model
|
| 9 |
+
- interpretability
|
| 10 |
+
---
|
| 11 |
# MoTIF — Concepts in Motion
|
| 12 |
|
| 13 |
[**Read the Paper (arXiv)**](https://arxiv.org/pdf/2509.20899)
|
| 14 |
|
| 15 |
+
[**GitHub Repo**](https://github.com/patrick-knab/MoTIF)
|
| 16 |
+
|
| 17 |
## Abstract
|
| 18 |
|
| 19 |
+
Concept Bottleneck Models (CBMs) enable interpretable image classification by structuring predictions around human-understandable concepts, but extending this paradigm to video remains challenging due to the difficulty of extracting concepts and modeling them over time. In this paper, we introduce MoTIF (Moving Temporal Interpretable Framework), a transformer-based concept architecture that operates on sequences of temporally grounded concept activations, by employing per-concept temporal self-attention to model when individual concepts recur and how their temporal patterns contribute to predictions. Central to the framework is an agentic concept discovery module to automatically extract object- and action-centric textual concepts from videos, yielding temporally expressive concept sets without manual supervision. Across multiple video benchmarks, this combination substantially narrows the performance gap between interpretable and black-box video models while maintaining faithful and temporally grounded concept explanations.
|
| 20 |
|
| 21 |
---
|
| 22 |
|