Is there any parameter to tune for more accurate diarization
#13
by
lingxue156
- opened
thank you guys for this excellent model which combine the semantic understanding and speaker diarization! I read your paper and noticed you used HDBSCAN clustering during pre-training with a fixed threshold of 0.67. Right now, I'm testing it in a meeting diarization scenario, and I've found the model leans a bit too conservative—it tends to identify fewer speakers than are actually present. Even when voices are pretty distinct (like different female speakers), they often end up lumped together.
So I was wondering: is the clustering threshold adjustable? And what other parameters could I tweak to make the diarization part a bit more aggressive? 😊