VoxCPM-0.5B-RKNN2

(English README see below)

VoxCPM 是一种创新的无分词器文本转语音(TTS)系统,重新定义了语音合成的真实感。通过在连续空间中建模语音,它克服了离散标记化的局限,并实现了两项核心能力:上下文感知的语音生成和逼真的零样本语音克隆。 不同于将语音转换为离散标记的主流方法,VoxCPM 采用端到端的扩散自回归架构,直接从文本生成连续的语音表示。它基于 MiniCPM-4 主干构建,通过分层语言建模和 FSQ 约束实现了隐式的语义-声学解耦,极大地提升了表现力和生成稳定性。

模型架构

  • 推理速度(RKNN2):RK3588上RTF约8(生成10s音频需要推理80s)
  • 大致内存占用(RKNN2):约3.3GB

使用方法

  1. 克隆项目到本地

  2. 安装依赖

pip install "numpy<2" scipy soundfile tqdm transformers sentencepiece rknn-toolkit-lite2
  1. 运行
python onnx_infer-rknn2.py --onnx-dir . --tokenizer-dir . --base-hf-dir . --residual-hf-dir . --text "哇, 这个模型居然在RK3588这个辣鸡SoC上也能完美运行!" --prompt-audio basic_ref_zh.wav --prompt-text "对,这就是我,万人敬仰的太乙真人。" --output rknn_output.wav --cfg-value 2.0 --inference-timesteps 10 --seed 1234

可选参数:

  • --text: 要生成的文本
  • --prompt-audio: 参考音频路径(用于语音克隆)
  • --prompt-text: 参考音频对应的文本(使用参考音频时必填)
  • --cfg-value: CFG引导强度,默认2.0
  • --inference-timesteps: 扩散步数,默认10
  • --seed: 随机种子
  • --output: 输出音频路径

运行效果

> python onnx_infer-rknn2.py --onnx-dir . --tokenizer-dir . --base-hf-dir . --residual-hf-dir . --text "哇, 这个模型居然在RK3588这个辣鸡SoC上也能完美运行!" --prompt-audio basic_ref_zh.wav --prompt-text "对,这就是我,万人敬仰的太乙真人。" --output rknn_output.wav --cfg-value 2.0 --inference-timesteps 10 --seed 1234

I rkllm: rkllm-runtime version: 1.2.3, rknpu driver version: 0.9.8, platform: RK3588
I rkllm: loading rkllm model from ./base_lm.rkllm
I rkllm: rkllm-toolkit version: 1.2.3, max_context_limit: 4096, npu_core_num: 1, target_platform: RK3588, model_dtype: FP16
I rkllm: Enabled cpus: [4, 5, 6, 7]
I rkllm: Enabled cpus num: 4
I rkllm: rkllm-runtime version: 1.2.3, rknpu driver version: 0.9.8, platform: RK3588
I rkllm: loading rkllm model from ./residual_lm.rkllm
I rkllm: rkllm-toolkit version: 1.2.2, max_context_limit: 4096, npu_core_num: 3, target_platform: RK3588, model_dtype: FP16
I rkllm: Enabled cpus: [4, 5, 6, 7]
I rkllm: Enabled cpus num: 4
W rknn-toolkit-lite2 version: 2.3.2
I RKNN: [18:58:26.264] RKNN Runtime Information, librknnrt version: 2.3.2 (429f97ae6b@2025-04-09T09:09:27)
I RKNN: [18:58:26.264] RKNN Driver Information, version: 0.9.8
I RKNN: [18:58:26.265] RKNN Model Information, version: 6, toolkit version: 2.3.0(compiler version: 2.3.0 (@2024-11-07T08:11:34)), target: RKNPU v2, target platform: rk3588, framework name: ONNX, framework layout: NCHW, model inference type: static_shape
W RKNN: [18:58:26.404] query RKNN_QUERY_INPUT_DYNAMIC_RANGE error, rknn model is static shape type, please export rknn with dynamic_shapes
W Query dynamic range failed. Ret code: RKNN_ERR_MODEL_INVALID. (If it is a static shape RKNN model, please ignore the above warning message.)
W rknn-toolkit-lite2 version: 2.3.2
I RKNN: [18:58:26.537] RKNN Runtime Information, librknnrt version: 2.3.2 (429f97ae6b@2025-04-09T09:09:27)
I RKNN: [18:58:26.537] RKNN Driver Information, version: 0.9.8
I RKNN: [18:58:26.537] RKNN Model Information, version: 6, toolkit version: 2.3.0(compiler version: 2.3.0 (@2024-11-07T08:11:34)), target: RKNPU v2, target platform: rk3588, framework name: ONNX, framework layout: NCHW, model inference type: static_shape
W RKNN: [18:58:26.616] query RKNN_QUERY_INPUT_DYNAMIC_RANGE error, rknn model is static shape type, please export rknn with dynamic_shapes
W Query dynamic range failed. Ret code: RKNN_ERR_MODEL_INVALID. (If it is a static shape RKNN model, please ignore the above warning message.)
W rknn-toolkit-lite2 version: 2.3.2
I RKNN: [18:58:26.795] RKNN Runtime Information, librknnrt version: 2.3.2 (429f97ae6b@2025-04-09T09:09:27)
I RKNN: [18:58:26.795] RKNN Driver Information, version: 0.9.8
I RKNN: [18:58:26.795] RKNN Model Information, version: 6, toolkit version: 2.3.2(compiler version: 2.3.2 (e045de294f@2025-04-07T19:48:25)), target: RKNPU v2, target platform: rk3588, framework name: ONNX, framework layout: NCHW, model inference type: static_shape
W RKNN: [18:58:27.020] query RKNN_QUERY_INPUT_DYNAMIC_RANGE error, rknn model is static shape type, please export rknn with dynamic_shapes
W Query dynamic range failed. Ret code: RKNN_ERR_MODEL_INVALID. (If it is a static shape RKNN model, please ignore the above warning message.)
W rknn-toolkit-lite2 version: 2.3.2
I RKNN: [18:58:27.194] RKNN Runtime Information, librknnrt version: 2.3.2 (429f97ae6b@2025-04-09T09:09:27)
I RKNN: [18:58:27.194] RKNN Driver Information, version: 0.9.8
I RKNN: [18:58:27.194] RKNN Model Information, version: 6, toolkit version: 2.3.2(compiler version: 2.3.2 (e045de294f@2025-04-07T19:48:25)), target: RKNPU v2, target platform: rk3588, framework name: ONNX, framework layout: NCHW, model inference type: static_shape
W RKNN: [18:58:27.317] query RKNN_QUERY_INPUT_DYNAMIC_RANGE error, rknn model is static shape type, please export rknn with dynamic_shapes
W Query dynamic range failed. Ret code: RKNN_ERR_MODEL_INVALID. (If it is a static shape RKNN model, please ignore the above warning message.)
W rknn-toolkit-lite2 version: 2.3.2
I RKNN: [18:58:27.431] RKNN Runtime Information, librknnrt version: 2.3.2 (429f97ae6b@2025-04-09T09:09:27)
I RKNN: [18:58:27.431] RKNN Driver Information, version: 0.9.8
I RKNN: [18:58:27.431] RKNN Model Information, version: 6, toolkit version: 2.3.2(compiler version: 2.3.2 (@2025-04-03T08:26:16)), target: RKNPU v2, target platform: rk3588, framework name: ONNX, framework layout: NCHW, model inference type: dynamic_shape
W rknn-toolkit-lite2 version: 2.3.2
I RKNN: [18:58:27.547] RKNN Runtime Information, librknnrt version: 2.3.2 (429f97ae6b@2025-04-09T09:09:27)
I RKNN: [18:58:27.547] RKNN Driver Information, version: 0.9.8
I RKNN: [18:58:27.547] RKNN Model Information, version: 6, toolkit version: 2.3.2(compiler version: 2.3.2 (e045de294f@2025-04-07T19:48:25)), target: RKNPU v2, target platform: rk3588, framework name: ONNX, framework layout: NCHW, model inference type: static_shape
W RKNN: [18:58:27.549] query RKNN_QUERY_INPUT_DYNAMIC_RANGE error, rknn model is static shape type, please export rknn with dynamic_shapes
W Query dynamic range failed. Ret code: RKNN_ERR_MODEL_INVALID. (If it is a static shape RKNN model, please ignore the above warning message.)
W rknn-toolkit-lite2 version: 2.3.2
I RKNN: [18:58:27.728] RKNN Runtime Information, librknnrt version: 2.3.2 (429f97ae6b@2025-04-09T09:09:27)
I RKNN: [18:58:27.728] RKNN Driver Information, version: 0.9.8
I RKNN: [18:58:27.728] RKNN Model Information, version: 6, toolkit version: 2.3.2(compiler version: 2.3.2 (e045de294f@2025-04-07T19:48:25)), target: RKNPU v2, target platform: rk3588, framework name: ONNX, framework layout: NCHW, model inference type: static_shape
W RKNN: [18:58:27.819] query RKNN_QUERY_INPUT_DYNAMIC_RANGE error, rknn model is static shape type, please export rknn with dynamic_shapes
W Query dynamic range failed. Ret code: RKNN_ERR_MODEL_INVALID. (If it is a static shape RKNN model, please ignore the above warning message.)
W rknn-toolkit-lite2 version: 2.3.2
I RKNN: [18:58:27.937] RKNN Runtime Information, librknnrt version: 2.3.2 (429f97ae6b@2025-04-09T09:09:27)
I RKNN: [18:58:27.937] RKNN Driver Information, version: 0.9.8
I RKNN: [18:58:27.937] RKNN Model Information, version: 6, toolkit version: 2.3.0(compiler version: 2.3.0 (@2024-11-07T08:11:34)), target: RKNPU v2, target platform: rk3588, framework name: ONNX, framework layout: NCHW, model inference type: static_shape
W RKNN: [18:58:27.940] query RKNN_QUERY_INPUT_DYNAMIC_RANGE error, rknn model is static shape type, please export rknn with dynamic_shapes
W Query dynamic range failed. Ret code: RKNN_ERR_MODEL_INVALID. (If it is a static shape RKNN model, please ignore the above warning message.)
W rknn-toolkit-lite2 version: 2.3.2
I RKNN: [18:58:28.058] RKNN Runtime Information, librknnrt version: 2.3.2 (429f97ae6b@2025-04-09T09:09:27)
I RKNN: [18:58:28.058] RKNN Driver Information, version: 0.9.8
I RKNN: [18:58:28.058] RKNN Model Information, version: 6, toolkit version: 2.3.0(compiler version: 2.3.0 (@2024-11-07T08:11:34)), target: RKNPU v2, target platform: rk3588, framework name: ONNX, framework layout: NCHW, model inference type: static_shape
W RKNN: [18:58:28.060] query RKNN_QUERY_INPUT_DYNAMIC_RANGE error, rknn model is static shape type, please export rknn with dynamic_shapes
W Query dynamic range failed. Ret code: RKNN_ERR_MODEL_INVALID. (If it is a static shape RKNN model, please ignore the above warning message.)
[time] vae_encode_0: 1601.56 ms
[time] vae_encode_38400: 1605.46 ms
[time] vae_encode_76800: 1591.07 ms
W The input[0] need NHWC data format, but NCHW set, the data format and data buffer will be changed to NHWC.
[time] locenc_0: 819.49 ms
W The input[0] need NHWC data format, but NCHW set, the data format and data buffer will be changed to NHWC.
[time] locenc_64: 818.33 ms
W The input[0] need NHWC data format, but NCHW set, the data format and data buffer will be changed to NHWC.
[time] locenc_128: 819.09 ms
[time] base_lm initial: 579.08 ms
[time] fsq_init_0: 2.54 ms
[time] fsq_init_64: 1.86 ms
[time] fsq_init_128: 1.79 ms
[time] residual_lm initial: 139.10 ms
gen_loop:   0%|                                                                          | 0/2000 [00:00<?, ?it/s][time] lm_to_dit: 0.82 ms
[time] res_to_dit: 0.56 ms
100%|█████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 33.32it/s]
W The input[0] need NHWC data format, but NCHW set, the data format and data buffer will be changed to NHWC.4it/s]
[time] locenc_step: 16.32 ms
gen_loop:   0%|                                                                  | 1/2000 [00:00<14:30,  2.30it/s][time] lm_to_dit: 0.57 ms
[time] res_to_dit: 0.44 ms
100%|█████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 33.10it/s]
W The input[0] need NHWC data format, but NCHW set, the data format and data buffer will be changed to NHWC.1it/s]
[time] locenc_step: 15.84 ms
gen_loop:   0%|                                                                  | 2/2000 [00:00<14:27,  2.30it/s][time] lm_to_dit: 0.56 ms
[time] res_to_dit: 0.50 ms
100%|█████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 31.93it/s]

...

W The input[0] need NHWC data format, but NCHW set, the data format and data buffer will be changed to NHWC.5it/s]
[time] locenc_step: 15.88 ms
gen_loop:   6%|███▉                                                            | 123/2000 [00:53<13:35,  2.30it/s][time] lm_to_dit: 0.57 ms
[time] res_to_dit: 0.49 ms
100%|█████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 32.94it/s]
W The input[0] need NHWC data format, but NCHW set, the data format and data buffer will be changed to NHWC.6it/s]
[time] locenc_step: 15.84 ms
gen_loop:   6%|███▉                                                            | 123/2000 [00:54<13:44,  2.28it/s]
[time] vae_decode_0: 1044.00 ms
[time] vae_decode_60: 1018.03 ms
[time] vae_decode_120: 1020.72 ms
[time] vae_decode_180: 1021.19 ms
[time] vae_decode_240: 1006.85 ms
Saved: rknn_output.wav

模型转换

懒得写了,待补充

已知问题

  • 某些情况下语音生成可能陷入死循环,原项目似乎有检测死循环的机制,但我这里没有实现。
  • 由于RKNN工具链的内部问题,locenc模型没有办法在一个模型里配置两种输入长度的两组shape,因此只能单独转换两个模型。
  • 由于RKLLM工具链/运行时的内部问题,两个LLM的输出张量的数值都只有正确结果的四分之一,手动乘4之后可以得到正确结果。
  • 由于RKNN工具链目前不支持非4维输入模型多batch使用多NPU核的数据并行推理,脚本中CFG是分两次单独进行的,速度较慢。

参考

English README

VoxCPM is an innovative tokenizer-free Text-to-Speech (TTS) system that redefines realism in speech synthesis. By modeling speech in continuous space, it overcomes the limitations of discrete tokenization and achieves two core capabilities: context-aware speech generation and realistic zero-shot voice cloning.

Unlike mainstream approaches that convert speech into discrete tokens, VoxCPM adopts an end-to-end diffusion autoregressive architecture that directly generates continuous speech representations from text. Built on the MiniCPM-4 backbone, it achieves implicit semantic-acoustic decoupling through hierarchical language modeling and FSQ constraints, greatly enhancing expressiveness and generation stability.

  • Inference speed (RKNN2): RTF approximately 8 on RK3588 (80s inference time to generate 10s audio)
  • Approximate memory usage (RKNN2): ~3.3GB

Usage

  1. Clone the project locally

  2. Install dependencies

pip install "numpy<2" scipy soundfile tqdm transformers sentencepiece rknn-toolkit-lite2
  1. Run
python onnx_infer-rknn2.py --onnx-dir . --tokenizer-dir . --base-hf-dir . --residual-hf-dir . --text "Wow, this model actually runs perfectly on the RK3588 SoC!" --prompt-audio basic_ref_zh.wav --prompt-text "对,这就是我,万人敬仰的太乙真人。" --output rknn_output.wav --cfg-value 2.0 --inference-timesteps 10 --seed 1234

Optional parameters:

  • --text: Text to generate
  • --prompt-audio: Reference audio path (for voice cloning)
  • --prompt-text: Text corresponding to the reference audio (required when using reference audio)
  • --cfg-value: CFG guidance strength, default 2.0
  • --inference-timesteps: Number of diffusion steps, default 10
  • --seed: Random seed
  • --output: Output audio path

Performance

> python onnx_infer-rknn2.py --onnx-dir . --tokenizer-dir . --base-hf-dir . --residual-hf-dir . --text "哇, 这个模型居然在RK3588这个辣鸡SoC上也能完美运行!" --prompt-audio basic_ref_zh.wav --prompt-text "对,这就是我,万人敬仰的太乙真人。" --output rknn_output.wav --cfg-value 2.0 --inference-timesteps 10 --seed 1234

I rkllm: rkllm-runtime version: 1.2.3, rknpu driver version: 0.9.8, platform: RK3588
I rkllm: loading rkllm model from ./base_lm.rkllm
I rkllm: rkllm-toolkit version: 1.2.3, max_context_limit: 4096, npu_core_num: 1, target_platform: RK3588, model_dtype: FP16
I rkllm: Enabled cpus: [4, 5, 6, 7]
I rkllm: Enabled cpus num: 4
I rkllm: rkllm-runtime version: 1.2.3, rknpu driver version: 0.9.8, platform: RK3588
I rkllm: loading rkllm model from ./residual_lm.rkllm
I rkllm: rkllm-toolkit version: 1.2.2, max_context_limit: 4096, npu_core_num: 3, target_platform: RK3588, model_dtype: FP16
I rkllm: Enabled cpus: [4, 5, 6, 7]
I rkllm: Enabled cpus num: 4
W rknn-toolkit-lite2 version: 2.3.2
I RKNN: [18:58:26.264] RKNN Runtime Information, librknnrt version: 2.3.2 (429f97ae6b@2025-04-09T09:09:27)
I RKNN: [18:58:26.264] RKNN Driver Information, version: 0.9.8
I RKNN: [18:58:26.265] RKNN Model Information, version: 6, toolkit version: 2.3.0(compiler version: 2.3.0 (@2024-11-07T08:11:34)), target: RKNPU v2, target platform: rk3588, framework name: ONNX, framework layout: NCHW, model inference type: static_shape
W RKNN: [18:58:26.404] query RKNN_QUERY_INPUT_DYNAMIC_RANGE error, rknn model is static shape type, please export rknn with dynamic_shapes
W Query dynamic range failed. Ret code: RKNN_ERR_MODEL_INVALID. (If it is a static shape RKNN model, please ignore the above warning message.)
W rknn-toolkit-lite2 version: 2.3.2
I RKNN: [18:58:26.537] RKNN Runtime Information, librknnrt version: 2.3.2 (429f97ae6b@2025-04-09T09:09:27)
I RKNN: [18:58:26.537] RKNN Driver Information, version: 0.9.8
I RKNN: [18:58:26.537] RKNN Model Information, version: 6, toolkit version: 2.3.0(compiler version: 2.3.0 (@2024-11-07T08:11:34)), target: RKNPU v2, target platform: rk3588, framework name: ONNX, framework layout: NCHW, model inference type: static_shape
W RKNN: [18:58:26.616] query RKNN_QUERY_INPUT_DYNAMIC_RANGE error, rknn model is static shape type, please export rknn with dynamic_shapes
W Query dynamic range failed. Ret code: RKNN_ERR_MODEL_INVALID. (If it is a static shape RKNN model, please ignore the above warning message.)
W rknn-toolkit-lite2 version: 2.3.2
I RKNN: [18:58:26.795] RKNN Runtime Information, librknnrt version: 2.3.2 (429f97ae6b@2025-04-09T09:09:27)
I RKNN: [18:58:26.795] RKNN Driver Information, version: 0.9.8
I RKNN: [18:58:26.795] RKNN Model Information, version: 6, toolkit version: 2.3.2(compiler version: 2.3.2 (e045de294f@2025-04-07T19:48:25)), target: RKNPU v2, target platform: rk3588, framework name: ONNX, framework layout: NCHW, model inference type: static_shape
W RKNN: [18:58:27.020] query RKNN_QUERY_INPUT_DYNAMIC_RANGE error, rknn model is static shape type, please export rknn with dynamic_shapes
W Query dynamic range failed. Ret code: RKNN_ERR_MODEL_INVALID. (If it is a static shape RKNN model, please ignore the above warning message.)
W rknn-toolkit-lite2 version: 2.3.2
I RKNN: [18:58:27.194] RKNN Runtime Information, librknnrt version: 2.3.2 (429f97ae6b@2025-04-09T09:09:27)
I RKNN: [18:58:27.194] RKNN Driver Information, version: 0.9.8
I RKNN: [18:58:27.194] RKNN Model Information, version: 6, toolkit version: 2.3.2(compiler version: 2.3.2 (e045de294f@2025-04-07T19:48:25)), target: RKNPU v2, target platform: rk3588, framework name: ONNX, framework layout: NCHW, model inference type: static_shape
W RKNN: [18:58:27.317] query RKNN_QUERY_INPUT_DYNAMIC_RANGE error, rknn model is static shape type, please export rknn with dynamic_shapes
W Query dynamic range failed. Ret code: RKNN_ERR_MODEL_INVALID. (If it is a static shape RKNN model, please ignore the above warning message.)
W rknn-toolkit-lite2 version: 2.3.2
I RKNN: [18:58:27.431] RKNN Runtime Information, librknnrt version: 2.3.2 (429f97ae6b@2025-04-09T09:09:27)
I RKNN: [18:58:27.431] RKNN Driver Information, version: 0.9.8
I RKNN: [18:58:27.431] RKNN Model Information, version: 6, toolkit version: 2.3.2(compiler version: 2.3.2 (@2025-04-03T08:26:16)), target: RKNPU v2, target platform: rk3588, framework name: ONNX, framework layout: NCHW, model inference type: dynamic_shape
W rknn-toolkit-lite2 version: 2.3.2
I RKNN: [18:58:27.547] RKNN Runtime Information, librknnrt version: 2.3.2 (429f97ae6b@2025-04-09T09:09:27)
I RKNN: [18:58:27.547] RKNN Driver Information, version: 0.9.8
I RKNN: [18:58:27.547] RKNN Model Information, version: 6, toolkit version: 2.3.2(compiler version: 2.3.2 (e045de294f@2025-04-07T19:48:25)), target: RKNPU v2, target platform: rk3588, framework name: ONNX, framework layout: NCHW, model inference type: static_shape
W RKNN: [18:58:27.549] query RKNN_QUERY_INPUT_DYNAMIC_RANGE error, rknn model is static shape type, please export rknn with dynamic_shapes
W Query dynamic range failed. Ret code: RKNN_ERR_MODEL_INVALID. (If it is a static shape RKNN model, please ignore the above warning message.)
W rknn-toolkit-lite2 version: 2.3.2
I RKNN: [18:58:27.728] RKNN Runtime Information, librknnrt version: 2.3.2 (429f97ae6b@2025-04-09T09:09:27)
I RKNN: [18:58:27.728] RKNN Driver Information, version: 0.9.8
I RKNN: [18:58:27.728] RKNN Model Information, version: 6, toolkit version: 2.3.2(compiler version: 2.3.2 (e045de294f@2025-04-07T19:48:25)), target: RKNPU v2, target platform: rk3588, framework name: ONNX, framework layout: NCHW, model inference type: static_shape
W RKNN: [18:58:27.819] query RKNN_QUERY_INPUT_DYNAMIC_RANGE error, rknn model is static shape type, please export rknn with dynamic_shapes
W Query dynamic range failed. Ret code: RKNN_ERR_MODEL_INVALID. (If it is a static shape RKNN model, please ignore the above warning message.)
W rknn-toolkit-lite2 version: 2.3.2
I RKNN: [18:58:27.937] RKNN Runtime Information, librknnrt version: 2.3.2 (429f97ae6b@2025-04-09T09:09:27)
I RKNN: [18:58:27.937] RKNN Driver Information, version: 0.9.8
I RKNN: [18:58:27.937] RKNN Model Information, version: 6, toolkit version: 2.3.0(compiler version: 2.3.0 (@2024-11-07T08:11:34)), target: RKNPU v2, target platform: rk3588, framework name: ONNX, framework layout: NCHW, model inference type: static_shape
W RKNN: [18:58:27.940] query RKNN_QUERY_INPUT_DYNAMIC_RANGE error, rknn model is static shape type, please export rknn with dynamic_shapes
W Query dynamic range failed. Ret code: RKNN_ERR_MODEL_INVALID. (If it is a static shape RKNN model, please ignore the above warning message.)
W rknn-toolkit-lite2 version: 2.3.2
I RKNN: [18:58:28.058] RKNN Runtime Information, librknnrt version: 2.3.2 (429f97ae6b@2025-04-09T09:09:27)
I RKNN: [18:58:28.058] RKNN Driver Information, version: 0.9.8
I RKNN: [18:58:28.058] RKNN Model Information, version: 6, toolkit version: 2.3.0(compiler version: 2.3.0 (@2024-11-07T08:11:34)), target: RKNPU v2, target platform: rk3588, framework name: ONNX, framework layout: NCHW, model inference type: static_shape
W RKNN: [18:58:28.060] query RKNN_QUERY_INPUT_DYNAMIC_RANGE error, rknn model is static shape type, please export rknn with dynamic_shapes
W Query dynamic range failed. Ret code: RKNN_ERR_MODEL_INVALID. (If it is a static shape RKNN model, please ignore the above warning message.)
[time] vae_encode_0: 1601.56 ms
[time] vae_encode_38400: 1605.46 ms
[time] vae_encode_76800: 1591.07 ms
W The input[0] need NHWC data format, but NCHW set, the data format and data buffer will be changed to NHWC.
[time] locenc_0: 819.49 ms
W The input[0] need NHWC data format, but NCHW set, the data format and data buffer will be changed to NHWC.
[time] locenc_64: 818.33 ms
W The input[0] need NHWC data format, but NCHW set, the data format and data buffer will be changed to NHWC.
[time] locenc_128: 819.09 ms
[time] base_lm initial: 579.08 ms
[time] fsq_init_0: 2.54 ms
[time] fsq_init_64: 1.86 ms
[time] fsq_init_128: 1.79 ms
[time] residual_lm initial: 139.10 ms
gen_loop:   0%|                                                                          | 0/2000 [00:00<?, ?it/s][time] lm_to_dit: 0.82 ms
[time] res_to_dit: 0.56 ms
100%|█████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 33.32it/s]
W The input[0] need NHWC data format, but NCHW set, the data format and data buffer will be changed to NHWC.4it/s]
[time] locenc_step: 16.32 ms
gen_loop:   0%|                                                                  | 1/2000 [00:00<14:30,  2.30it/s][time] lm_to_dit: 0.57 ms
[time] res_to_dit: 0.44 ms
100%|█████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 33.10it/s]
W The input[0] need NHWC data format, but NCHW set, the data format and data buffer will be changed to NHWC.1it/s]
[time] locenc_step: 15.84 ms
gen_loop:   0%|                                                                  | 2/2000 [00:00<14:27,  2.30it/s][time] lm_to_dit: 0.56 ms
[time] res_to_dit: 0.50 ms
100%|█████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 31.93it/s]

...

W The input[0] need NHWC data format, but NCHW set, the data format and data buffer will be changed to NHWC.5it/s]
[time] locenc_step: 15.88 ms
gen_loop:   6%|███▉                                                            | 123/2000 [00:53<13:35,  2.30it/s][time] lm_to_dit: 0.57 ms
[time] res_to_dit: 0.49 ms
100%|█████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 32.94it/s]
W The input[0] need NHWC data format, but NCHW set, the data format and data buffer will be changed to NHWC.6it/s]
[time] locenc_step: 15.84 ms
gen_loop:   6%|███▉                                                            | 123/2000 [00:54<13:44,  2.28it/s]
[time] vae_decode_0: 1044.00 ms
[time] vae_decode_60: 1018.03 ms
[time] vae_decode_120: 1020.72 ms
[time] vae_decode_180: 1021.19 ms
[time] vae_decode_240: 1006.85 ms
Saved: rknn_output.wav

Model Conversion

TODO: Documentation to be added

Known Issues

  • In some cases, speech generation may fall into an infinite loop. The original project seems to have a mechanism to detect infinite loops, but it is not implemented here.
  • Due to internal issues with the RKNN toolchain, the locenc model cannot configure two sets of shapes for two different input lengths in a single model, so two separate models must be converted.
  • Due to internal issues with the RKLLM toolchain/runtime, the output tensor values of both LLMs are only one-quarter of the correct result. Multiplying by 4 manually yields the correct result.
  • Since the RKNN toolchain currently does not support data-parallel inference using multiple NPU cores for non-4D input models with multiple batches, CFG in the script is performed separately in two passes, which is relatively slow.

References

Downloads last month
6
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for happyme531/VoxCPM-0.5B-RKNN2

Finetuned
(5)
this model