Multilingual Performance and Testing with Eagle3 Module
Hi NVIDIA team,
Thank you for releasing this Eagle3 speculative decoding module. I'm evaluating this model for production use in a Dutch development environment where we work with a mix of English and Dutch prompts.
I have several questions about multilingual usage:
First, has this Eagle3 module been tested with non-English prompts, specifically Dutch or other European languages?
Second, what is the expected acceptance rate when using mixed-language context (e.g., Dutch prompts with English content, or vice versa)? The model card shows impressive acceptance rates (1.64-1.84) for English MT-Bench categories, but multilingual performance isn't mentioned.
Third, are there known performance degradations or limitations when using the Eagle3 module with non-English content?
Some context about our use case: we need high-concurrency inference (which is why Eagle3's single-token draft is interesting), we work with mixed Dutch/English prompts in the same conversation.
The base model (gpt-oss-120b) has some multilingual capability, but since the Eagle modules were trained on English synthetic data (ultrachat_200k and nemotron), I'm concerned about the speculative decoding acceptance rate degrading significantly with Dutch content.
Any insights or empirical data on this would be greatly appreciated!
Thanks in advance!
The nemotron dataset has several dedicated multilingual categories which are included in the training data of this model. I do not believe that Dutch is among them specifically, but I would expect reasonable performance on such tasks. Note also that since GPT-OSS is a reasoning model, a speedup can still be achieved on entirely out-of-distribution multilingual prompts since the model often still reasons in english.