Is there any parameter to tune for more accurate diarization

#13
by lingxue156 - opened

thank you guys for this excellent model which combine the semantic understanding and speaker diarization! I read your paper and noticed you used HDBSCAN clustering during pre-training with a fixed threshold of 0.67. Right now, I'm testing it in a meeting diarization scenario, and I've found the model leans a bit too conservative—it tends to identify fewer speakers than are actually present. Even when voices are pretty distinct (like different female speakers), they often end up lumped together.

So I was wondering: is the clustering threshold adjustable? And what other parameters could I tweak to make the diarization part a bit more aggressive? 😊

Sign up or log in to comment