| --- |
| arxiv: 2603.15818 |
| license: mit |
| tags: |
| - multimodal |
| - emotion-recognition |
| - ambivalence |
| - hesitancy |
| - ABAW10 |
| --- |
| |
| # ConflictAwareAH — Ambivalence/Hesitancy Recognition |
|
|
| Pre-trained weights for the Conflict-Aware Multimodal Fusion model (ABAW10 Challenge, AVGF1 0.715). |
|
|
| ## Usage |
|
|
| GitHub: https://github.com/Bekhouche/ConflictAwareAH |
|
|
| ```python |
| import torch |
| from bah.models import ConflictAwareAHModel |
| from huggingface_hub import hf_hub_download |
| |
| ckpt_path = hf_hub_download(repo_id="Bekhouche/ConflictAwareAH", filename="best_model.pt") |
| ckpt = torch.load(ckpt_path, map_location="cpu") |
| args = ckpt["args"] |
| |
| # Infer fusion_type from checkpoint keys |
| state_keys = set(ckpt["model"].keys()) |
| fusion_type = args.get("fusion_type") or ("6token" if any("fusion_transformer" in k for k in state_keys) else "concat") |
| |
| model = ConflictAwareAHModel( |
| video_model=args["video_model"], |
| audio_model=args["audio_model"], |
| text_model=args["text_model"], |
| dropout=0.0, |
| freeze_encoders=args.get("freeze_encoders", True), |
| unfreeze_top_k=args.get("unfreeze_top_k", 0), |
| num_transformer_layers=args.get("num_layers", 2), |
| fusion_type=fusion_type, |
| ) |
| model.load_state_dict(ckpt["model"], strict=True) |
| model.eval() |
| |
| text_blend = ckpt.get("text_blend", args.get("text_blend", 0.5)) |
| ``` |
|
|
| ## Config |
|
|
| - Encoders: VideoMAE-Base, HuBERT-Base, RoBERTa-GoEmotions (frozen) |
| - Dropout: 0.4 |
| - Text blend (inference): 0.5 |
|
|