Spaces:

pratik-250620
/

MultiModal-Coherence-AI

Running

App Files Files Community

MultiModal-Coherence-AI / README.md

pratik-250620

Upload folder using huggingface_hub

6835659 verified 27 days ago

preview code

raw

history blame contribute delete

1.11 kB

A newer version of the Streamlit SDK is available: 1.55.0

Upgrade

metadata

title: Multimodal Coherence AI
emoji: 🎨
colorFrom: purple
colorTo: pink
sdk: streamlit
sdk_version: 1.41.0
app_file: app.py
pinned: false
license: mit
short_description: Coherent text + image + audio with MSCI

Multimodal Coherence AI

Generate semantically coherent text + image + audio bundles and evaluate cross-modal alignment using the Multimodal Semantic Coherence Index (MSCI).

How it works

Text — generated via HF Inference API
Image — retrieved from a curated index using CLIP (ViT-B/32) embeddings
Audio — retrieved from a curated index using CLAP (HTSAT-unfused) embeddings
MSCI — computed as 0.45 * cos_sim(text, image) + 0.45 * cos_sim(text, audio)

Research

This demo accompanies a study evaluating multimodal semantic coherence across three research questions:

RQ1: Is MSCI sensitive to controlled semantic perturbations? (Supported, d > 2.0)
RQ2: Does structured planning improve cross-modal alignment? (Not supported)
RQ3: Does MSCI correlate with human coherence judgments? (Supported, rho = 0.379)