`onnxruntime-node crashes with malloc error during autoregressive generation with Chatterbox Turbo ONNX model

by tkpop777 - opened Jan 4

Jan 4

•

Hi, using the onnx model in python works great using huggingface transformers and the python version of onnxruntime. But trying to use it in nodejs gives an error with onnx-runtime not being able to process the chatterbox-turbo onnx model.

Environment

onnxruntime-node: 1.21.0 / 1.23.2 (tested both)
Node.js: 24.11.0
Platform: macOS (Darwin 23.6.0, Apple Silicon)
Model: https://huggingface.co/nicoboss/chatterbox-turbo-ONNX

Issue

When running autoregressive generation with the language_model.onnx from Chatterbox Turbo, the process crashes with a malloc error after ~12-15 inference iterations:

  malloc: *** error for object 0x...: pointer being freed was not allocated
  Exit code 134

Reproduction

  import * as ort from 'onnxruntime-node';

  const session = await ort.InferenceSession.create('language_model.onnx');

  for (let i = 0; i < 100; i++) {
    const outputs = await session.run({
      inputs_embeds: /* tensor */,
      attention_mask: /* tensor */,
      position_ids: /* tensor */,
      ...pastKeyValues  // empty KV cache tensors
    });
    // Crashes around iteration 12-15
  }

What I've tried (none worked)

Copying all tensor data between iterations
Using fresh empty KV cache each iteration (no caching)
Disabling enableMemPattern and enableCpuMemArena
Using fp32 instead of quantized models
Different onnxruntime-node versions (1.14.0, 1.21.0, 1.23.2)

Notes

The same model works perfectly with Python onnxruntime
The crash happens in native code, not JavaScript
Other ONNX models (speech_encoder, embed_tokens, conditional_decoder) work fine
Only the language_model with repeated inference calls crashes

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment