Compatibility with LoRA Adapter trained from originalmodel

#1
by deathknight0 - opened

Hi and thank you for porting this model to be HF compatible.

I noticed that the layer names and param count (as listed in the model.safetensors.index.json) differ from the original repo (eg param count of 8.33b vs 8.67b in the original). Just wondering, is the lora adapter trained from the original model (using the script provided by the repo authors) likely to be compatible with this release?

Thanks again

Hi @deathknight0 thanks for your message! Yes this model is slightly smaller because I've removed the decoder of the acoustic tokenizer as it isn't needed by the ASR model.

If you are speaking about this script, I don't think it will work, as the state dict has changed to be consistent with Transformers conventions. Someone on our team is looking into adapting that script for Transformers-compatible checkpoint, but if you already have something that would be great! By peaking into the modeling code, you should see how the state dict has changed (ASR and tokenizer) from the original (ASR and tokenizer).

FYI: this checkpoint was a draft. The final one is available under the Microsoft account here and VibeVoice ASR has been released inTransformers v5.3.0.

Actually it might be easier to see the mapping from original to Transformers here πŸ˜‰

Sign up or log in to comment