Parameter Count Mismatch: WavLM-Large vs FINALLY Speech Enhancement Model's Claim

#8
by fahim-inverseai - opened

I was examining the FINALLY speech enhancement model (https://arxiv.org/abs/2410.05920) released by SamsungLabs, which uses WavLM-Large as part of its architecture. I noticed a discrepancy in the reported number of parameters:

  • The original WavLM-Large model on Hugging Face is documented to have around 316M parameters.
  • However, the FINALLY model description mentions a larger number of parameters (358M parameters) attributed to the WavLM component, which does not match the standard WavLM-Large count.
    Screenshot from 2025-08-13 13-40-47.png

I’m curious if this is due to:

  • A modified WavLM-Large backbone in FINALLY (e.g., some layers added),
  • A different counting method (trainable vs total parameters), or
  • A documentation oversight.

Has anyone else noticed this mismatch? It would be helpful to clarify for anyone trying to reproduce results or analyze computational requirements.

Thanks in advance!

Sign up or log in to comment