`onnxruntime-node crashes with malloc error during autoregressive generation with Chatterbox Turbo ONNX model
#6
by
tkpop777
- opened
Hi, using the onnx model in python works great using huggingface transformers and the python version of onnxruntime. But trying to use it in nodejs gives an error with onnx-runtime not being able to process the chatterbox-turbo onnx model.
Environment
- onnxruntime-node: 1.21.0 / 1.23.2 (tested both)
- Node.js: 24.11.0
- Platform: macOS (Darwin 23.6.0, Apple Silicon)
- Model: https://huggingface.co/nicoboss/chatterbox-turbo-ONNX
Issue
When running autoregressive generation with the language_model.onnx from Chatterbox Turbo, the process crashes with a malloc error after ~12-15 inference iterations:
malloc: *** error for object 0x...: pointer being freed was not allocated
Exit code 134
Reproduction
import * as ort from 'onnxruntime-node';
const session = await ort.InferenceSession.create('language_model.onnx');
for (let i = 0; i < 100; i++) {
const outputs = await session.run({
inputs_embeds: /* tensor */,
attention_mask: /* tensor */,
position_ids: /* tensor */,
...pastKeyValues // empty KV cache tensors
});
// Crashes around iteration 12-15
}
What I've tried (none worked)
- Copying all tensor data between iterations
- Using fresh empty KV cache each iteration (no caching)
- Disabling enableMemPattern and enableCpuMemArena
- Using fp32 instead of quantized models
- Different onnxruntime-node versions (1.14.0, 1.21.0, 1.23.2)
Notes
- The same model works perfectly with Python onnxruntime
- The crash happens in native code, not JavaScript
- Other ONNX models (speech_encoder, embed_tokens, conditional_decoder) work fine
- Only the language_model with repeated inference calls crashes