SpeechGuard Fusion โ DCA + FiLM Module
Part of the SpeechGuard AI system submitted to Samsung EnnovateX AX Hackathon 2026.
Model Description
Deep Cross-Attention + FiLM fusion module that jointly processes keyword spotting features and speaker verification embeddings.
Primary architectural contribution of SpeechGuard AI.
Architecture
- Input Q: KWS features (B, T, 16) from BC-ResNet-8
- Input K,V: SV embedding (B, 192) from ECAPA-TDNN
- Cross-attention: 4 heads, 64-dim attention space
- FiLM conditioning: speaker d-vector modulates KWS features
- Output: fused score in [0, 1]
Performance
| Metric | Value |
|---|---|
| Parameters | 56,706 |
| Latency (CPU) | 0.1ms |
Usage
import torch
from speechguard.fusion.dca import DCAFusionModule
module = DCAFusionModule(d_kws=16, d_sv=192, d_attn=64)
# Load weights from checkpoint if available
kws_features = torch.randn(1, 20, 16)
sv_embedding = torch.randn(1, 192)
cosine_sim = torch.tensor([0.7])
result = module(kws_features, sv_embedding, cosine_sim)
print(result["fused_score"]) # tensor([0.XXXX])
Citation
Samsung EnnovateX AX Hackathon 2026 โ Problem #04 Team: Placecomm Prophets (IIT Kharagpur)
- Downloads last month
- 12
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support