Spaces:

pratik-250620
/

MultiModal-Coherence-AI

Running

Upload folder using huggingface_hub

6835659 verified 27 days ago

1.11 kB

	---
	title: Multimodal Coherence AI
	emoji: "\U0001f3a8"
	colorFrom: purple
	colorTo: pink
	sdk: streamlit
	sdk_version: "1.41.0"
	app_file: app.py
	pinned: false
	license: mit
	short_description: Coherent text + image + audio with MSCI
	---

	# Multimodal Coherence AI

	Generate semantically coherent text + image + audio bundles and evaluate
	cross-modal alignment using the Multimodal Semantic Coherence Index (MSCI).

	## How it works

	1. Text — generated via HF Inference API
	2. Image — retrieved from a curated index using CLIP (ViT-B/32) embeddings
	3. Audio — retrieved from a curated index using CLAP (HTSAT-unfused) embeddings
	4. MSCI — computed as `0.45 * cos_sim(text, image) + 0.45 * cos_sim(text, audio)`

	## Research

	This demo accompanies a study evaluating multimodal semantic coherence across
	three research questions:

	- RQ1: Is MSCI sensitive to controlled semantic perturbations? (Supported, d > 2.0)
	- RQ2: Does structured planning improve cross-modal alignment? (Not supported)
	- RQ3: Does MSCI correlate with human coherence judgments? (Supported, rho = 0.379)