arxiv:2603.15818

Conflict-Aware Multimodal Fusion for Ambivalence and Hesitancy Recognition

Published on Mar 16

Upvote

Authors:

Salah Eddine Bekhouche ,

Hichem Telli ,

Abstract

A multimodal framework called ConflictAwareAH uses pairwise conflict features between video, audio, and text modalities to detect ambivalence and hesitancy states more accurately than text-only approaches.

AI-generated summary

Ambivalence and hesitancy (A/H) are subtle affective states where a person shows conflicting signals through different channels -- saying one thing while their face or voice tells another story. Recognising these states automatically is valuable in clinical settings, but it is hard for machines because the key evidence lives in the disagreements between what is said, how it sounds, and what the face shows. We present ConflictAwareAH, a multimodal framework built for this problem. Three pre-trained encoders extract video, audio, and text representations. Pairwise conflict features -- element-wise absolute differences between modality embeddings -- serve as bidirectional cues: large cross-modal differences flag A/H, while small differences confirm behavioural consistency and anchor the negative class. This conflict-aware design addresses a key limitation of text-dominant approaches, which tend to over-detect A/H (high F1-AH) while struggling to confirm its absence: our multimodal model improves F1-NoAH by +4.6 points over text alone and halves the class-performance gap. A complementary text-guided late fusion strategy blends a text-only auxiliary head with the full model at inference, adding +4.1 Macro F1. On the BAH dataset from the ABAW10 Ambivalence/Hesitancy Challenge, our method reaches 0.694 Macro F1 on the labelled test split and 0.715 on the private leaderboard, outperforming published multimodal baselines by over 10 points -- all on a single GPU in under 25 minutes of training.

View arXiv page View PDF GitHub 2 Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2603.15818 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2603.15818 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2603.15818 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.