Add pipeline tag, paper link, and GitHub repository link

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +20 -2
README.md CHANGED
@@ -1,11 +1,12 @@
1
  ---
2
  language:
3
  - en
 
 
4
  tags:
5
  - vision-language
6
  - mdetr
7
  - xai
8
- license: mit
9
  model_index:
10
  - name: mdetr-gridvqa-pure
11
  task: visual-question-answering
@@ -17,6 +18,9 @@ model_index:
17
 
18
  This repository contains two paired reference models, **M_pure** and **M_spur**, built on identical transformer architectures (**MDETR**). These models, coupled with their corresponding datasets, together form a diagnostic framework to evaluate if Multimodal Explainable AI (MxAI) methods genuinely capture cross-modal synergy or simply report shallow feature correlations.
19
 
 
 
 
20
  ## Model Descriptions
21
 
22
  ### 1. M_pure (The Faithful Spatial Reasoner)
@@ -39,4 +43,18 @@ These models are released explicitly to stress-test vision-language explainabili
39
  | Evaluation Metric | M_pure on D_pure | M_spur on D_spur | M_spur on D_pure |
40
  | :--- | :---: | :---: | :---: |
41
  | **Global Accuracy** | >99% | 100% | **Catastrophic Failure** (8%-14% on multi-hop) |
42
- | **Causal Pathway** | True Spatial Relations | Bag-of-Words Shortcut | Unimodal Feature Collapse |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  language:
3
  - en
4
+ license: mit
5
+ pipeline_tag: image-text-to-text
6
  tags:
7
  - vision-language
8
  - mdetr
9
  - xai
 
10
  model_index:
11
  - name: mdetr-gridvqa-pure
12
  task: visual-question-answering
 
18
 
19
  This repository contains two paired reference models, **M_pure** and **M_spur**, built on identical transformer architectures (**MDETR**). These models, coupled with their corresponding datasets, together form a diagnostic framework to evaluate if Multimodal Explainable AI (MxAI) methods genuinely capture cross-modal synergy or simply report shallow feature correlations.
20
 
21
+ This model is presented in the paper [GridVQA-X: A Framework for Evaluating Multimodal Explainability Methods](https://huggingface.co/papers/2606.14740).
22
+ The official training and evaluation code can be found in the [GitHub Repository](https://github.com/AikyamLab/grid-vqax).
23
+
24
  ## Model Descriptions
25
 
26
  ### 1. M_pure (The Faithful Spatial Reasoner)
 
43
  | Evaluation Metric | M_pure on D_pure | M_spur on D_spur | M_spur on D_pure |
44
  | :--- | :---: | :---: | :---: |
45
  | **Global Accuracy** | >99% | 100% | **Catastrophic Failure** (8%-14% on multi-hop) |
46
+ | **Causal Pathway** | True Spatial Relations | Bag-of-Words Shortcut | Unimodal Feature Collapse |
47
+
48
+ ## Citation
49
+
50
+ ```bibtex
51
+ @misc{belsare2026gridvqaxframeworkevaluatingmultimodal,
52
+ title={GridVQA-X: A Framework for Evaluating Multimodal Explainability Methods},
53
+ author={Sujay Belsare and Sudarshan Nikhil and Sushant Kumar and Ponnurangam Kumaraguru and Chirag Agarwal},
54
+ year={2026},
55
+ eprint={2606.14740},
56
+ archivePrefix={arXiv},
57
+ primaryClass={cs.CV},
58
+ url={https://arxiv.org/abs/2606.14740},
59
+ }
60
+ ```