SpeechGuard Fusion — DCA + FiLM Module

Part of the SpeechGuard AI system submitted to Samsung EnnovateX AX Hackathon 2026.

Model Description

Deep Cross-Attention + FiLM fusion module that jointly processes keyword spotting features and speaker verification embeddings.

Primary architectural contribution of SpeechGuard AI.

Architecture

Input Q: KWS features (B, T, 16) from BC-ResNet-8
Input K,V: SV embedding (B, 192) from ECAPA-TDNN
Cross-attention: 4 heads, 64-dim attention space
FiLM conditioning: speaker d-vector modulates KWS features
Output: fused score in [0, 1]

Performance

Metric	Value
Parameters	56,706
Latency (CPU)	0.1ms

Usage

import torch
from speechguard.fusion.dca import DCAFusionModule

module = DCAFusionModule(d_kws=16, d_sv=192, d_attn=64)
# Load weights from checkpoint if available

kws_features = torch.randn(1, 20, 16)
sv_embedding = torch.randn(1, 192)
cosine_sim   = torch.tensor([0.7])

result = module(kws_features, sv_embedding, cosine_sim)
print(result["fused_score"])   # tensor([0.XXXX])

Citation

Samsung EnnovateX AX Hackathon 2026 — Problem #04 Team: Placecomm Prophets (IIT Kharagpur)

Downloads last month: 6

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support