Help exporting my fine-tuned Chatterbox Multilingual to ONNX (new language)
Hi everyone,
I fine-tuned Chatterbox Multilingual (PyTorch) with my own dataset to support an additional language (e.g., fa / Persian), and the fine-tuned model works correctly in PyTorch inference.
Now I want to export my fine-tuned model to ONNX, ideally in the same format/layout as this repo, which provides multiple ONNX graphs (e.g. speech_encoder.onnx, embed_tokens.onnx, conditional_decoder.onnx, and several language_model*.onnx variants like fp16/q4).
Hugging Face
Goal
Export the fine-tuned model to ONNX for fast inference (onnxruntime).
Match this repo’s structure (multiple ONNX components + optional fp16/quantized variants).
If possible, keep dynamic shapes where supported.
What I tried
optimum-cli export onnx (generic export)
torch.onnx.export (manual export)
…but I’m not sure how to reproduce the same split-graph pipeline used here (LM split + decoder + embeddings, plus fp16/q4 variants)
Questions
What is the recommended export pipeline to reproduce this repo’s ONNX artifacts (speech_encoder, embed_tokens, conditional_decoder, language_model*)?
Hugging Face
For a new language, do I need to update tokenizer.json / tokenizer_config.json / generation_config.json as well, and is there any special handling for language tokens?
Hugging Face
Which parts typically change with language fine-tuning — do I need to re-export all ONNX components or only the language model pieces?
Any tips for validating correctness (PyTorch vs ONNX outputs) + known gotchas (unsupported ops, dynamic axes, etc.)?