Add ONNX weights and update model card

Files changed (8) hide show

.gitattributes +6 -0
README.md +118 -3
codec_browser_onnx_meta.json +576 -0
moss_audio_tokenizer_decode_full.onnx +3 -0
moss_audio_tokenizer_decode_shared.data +3 -0
moss_audio_tokenizer_decode_step.onnx +3 -0
moss_audio_tokenizer_encode.data +3 -0
moss_audio_tokenizer_encode.onnx +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,9 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+*.data filter=lfs diff=lfs merge=lfs -text
+*.png filter=lfs diff=lfs merge=lfs -text
+*.jpg filter=lfs diff=lfs merge=lfs -text
+*.wav filter=lfs diff=lfs merge=lfs -text
+*.gguf filter=lfs diff=lfs merge=lfs -text
+tokenizer.json filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -1,3 +1,118 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+library_name: onnx
+tags:
+  - audio
+  - audio-tokenizer
+  - neural-codec
+  - moss-tts-family
+  - moss-audio-tokenizer-nano
+  - speech-tokenizer
+  - onnx
+  - onnxruntime
+  - browser
+---
+# MOSS-Audio-Tokenizer-Nano-ONNX
+This repository provides the **ONNX exports** of [MOSS-Audio-Tokenizer-Nano](https://huggingface.co/OpenMOSS-Team/MOSS-Audio-Tokenizer-Nano), the lightweight audio tokenizer used by [MOSS-TTS-Nano](https://huggingface.co/OpenMOSS-Team/MOSS-TTS-Nano). It is intended for **torch-free** deployment with ONNX Runtime and ONNX Runtime Web.
+## Overview
+The Nano variant is a lightweight tokenizer with about **20M parameters**, designed to reduce deployment cost while preserving strong perceptual quality.
+MOSS-Audio-Tokenizer-Nano supports:
+- **48 kHz**, **stereo** audio
+- **12.5 Hz** token rate
+- **16 RVQ codebooks**
+- high-fidelity reconstruction across variable bitrates
+This ONNX repository is designed for lightweight inference pipelines such as:
+- local CPU deployment with `onnxruntime`
+- browser deployment with `onnxruntime-web`
+- companion audio encoding/decoding for `MOSS-TTS-Nano-100M-ONNX`
+## Supported Backends
+| Backend | Runtime | Use Case |
+|---------|---------|----------|
+| **ONNX Runtime (CPU)** | `onnxruntime` | Local CPU inference |
+| **ONNX Runtime Web** | `onnxruntime-web` | Browser-based deployment |
+## Repository Contents
+| File | Description |
+|------|-------------|
+| `moss_audio_tokenizer_encode.onnx` | Encoder graph for waveform -> discrete audio codes |
+| `moss_audio_tokenizer_encode.data` | External weights for the encoder graph |
+| `moss_audio_tokenizer_decode_full.onnx` | Full decoder graph for audio codes -> waveform |
+| `moss_audio_tokenizer_decode_step.onnx` | Streaming decoder-step graph for incremental decode |
+| `moss_audio_tokenizer_decode_shared.data` | External weights shared by the decoder graphs |
+| `codec_browser_onnx_meta.json` | Metadata for browser / ONNX runtime integration |
+## Quick Start
+```bash
+huggingface-cli download OpenMOSS-Team/MOSS-Audio-Tokenizer-Nano-ONNX \
+    --local-dir weights/MOSS-Audio-Tokenizer-Nano-ONNX
+```
+This repository is typically used together with [OpenMOSS-Team/MOSS-TTS-Nano-100M-ONNX](https://huggingface.co/OpenMOSS-Team/MOSS-TTS-Nano-100M-ONNX) for fully torch-free MOSS-TTS-Nano deployment.
+## Main Repositories
+| Repository | Description |
+|------------|-------------|
+| [OpenMOSS/MOSS-TTS-Nano](https://github.com/OpenMOSS/MOSS-TTS-Nano) | MOSS-TTS-Nano source code and inference pipeline |
+| [OpenMOSS-Team/MOSS-TTS-Nano](https://huggingface.co/OpenMOSS-Team/MOSS-TTS-Nano) | PyTorch MOSS-TTS-Nano weights |
+| [OpenMOSS-Team/MOSS-Audio-Tokenizer-Nano](https://huggingface.co/OpenMOSS-Team/MOSS-Audio-Tokenizer-Nano) | PyTorch MOSS-Audio-Tokenizer-Nano weights |
+| [OpenMOSS-Team/MOSS-TTS-Nano-100M-ONNX](https://huggingface.co/OpenMOSS-Team/MOSS-TTS-Nano-100M-ONNX) | Companion ONNX TTS weights |
+## About MOSS-Audio-Tokenizer-Nano
+**MOSS-Audio-Tokenizer-Nano** serves as the lightweight codec backbone for MOSS-TTS-Nano. It keeps the same unified audio-token interface used across the MOSS-TTS family while reducing inference cost for CPU and browser deployment scenarios.
+For the original PyTorch implementation, setup instructions, and more background, see:
+- [MOSS-Audio-Tokenizer Repository](https://github.com/OpenMOSS/MOSS-Audio-Tokenizer)
+- [MOSS-TTS-Nano Repository](https://github.com/OpenMOSS/MOSS-TTS-Nano)
+## Citation
+If you use the MOSS-TTS work in your research or product, please cite:
+```bibtex
+@misc{openmoss2026mossttsnano,
+  title={MOSS-TTS-Nano},
+  author={OpenMOSS Team},
+  year={2026},
+  howpublished={GitHub repository},
+  url={https://github.com/OpenMOSS/MOSS-TTS-Nano}
+}
+```
+```bibtex
+@misc{gong2026mossttstechnicalreport,
+  title={MOSS-TTS Technical Report},
+  author={Yitian Gong and Botian Jiang and Yiwei Zhao and Yucheng Yuan and Kuangwei Chen and Yaozhou Jiang and Cheng Chang and Dong Hong and Mingshu Chen and Ruixiao Li and Yiyang Zhang and Yang Gao and Hanfu Chen and Ke Chen and Songlin Wang and Xiaogui Yang and Yuqian Zhang and Kexin Huang and ZhengYuan Lin and Kang Yu and Ziqi Chen and Jin Wang and Zhaoye Fei and Qinyuan Cheng and Shimin Li and Xipeng Qiu},
+  year={2026},
+  eprint={2603.18090},
+  archivePrefix={arXiv},
+  primaryClass={cs.SD},
+  url={https://arxiv.org/abs/2603.18090}
+}
+```
+```bibtex
+@misc{gong2026mossaudiotokenizerscalingaudiotokenizers,
+  title={MOSS-Audio-Tokenizer: Scaling Audio Tokenizers for Future Audio Foundation Models},
+  author={Yitian Gong and Kuangwei Chen and Zhaoye Fei and Xiaogui Yang and Ke Chen and Yang Wang and Kexin Huang and Mingshu Chen and Ruixiao Li and Qingyuan Cheng and Shimin Li and Xipeng Qiu},
+  year={2026},
+  eprint={2602.10934},
+  archivePrefix={arXiv},
+  primaryClass={cs.SD},
+  url={https://arxiv.org/abs/2602.10934}
+}
+```

codec_browser_onnx_meta.json ADDED Viewed

	@@ -0,0 +1,576 @@

+{
+  "format_version": 2,
+  "checkpoint_path": "MOSS-Audio-Tokenizer-Nano",
+  "files": {
+    "encode": "moss_audio_tokenizer_encode.onnx",
+    "decode_full": "moss_audio_tokenizer_decode_full.onnx",
+    "decode_step": "moss_audio_tokenizer_decode_step.onnx"
+  },
+  "external_data_files": {
+    "moss_audio_tokenizer_encode.onnx": [
+      "moss_audio_tokenizer_encode.data"
+    ],
+    "moss_audio_tokenizer_decode_full.onnx": [
+      "moss_audio_tokenizer_decode_shared.data"
+    ],
+    "moss_audio_tokenizer_decode_step.onnx": [
+      "moss_audio_tokenizer_decode_shared.data"
+    ]
+  },
+  "codec_config": {
+    "sample_rate": 48000,
+    "channels": 2,
+    "downsample_rate": 3840,
+    "num_quantizers": 16
+  },
+  "onnx": {
+    "opset": 17,
+    "encode_input_names": [
+      "waveform",
+      "input_lengths"
+    ],
+    "encode_output_names": [
+      "audio_codes",
+      "audio_code_lengths"
+    ],
+    "decode_input_names": [
+      "audio_codes",
+      "audio_code_lengths"
+    ],
+    "decode_output_names": [
+      "audio",
+      "audio_lengths"
+    ],
+    "decode_step_input_names": [
+      "audio_codes",
+      "audio_code_lengths",
+      "transformer_offset_0",
+      "transformer_offset_1",
+      "transformer_offset_2",
+      "transformer_offset_3",
+      "attn_offset_0",
+      "attn_cached_keys_0",
+      "attn_cached_values_0",
+      "attn_cached_positions_0",
+      "attn_offset_1",
+      "attn_cached_keys_1",
+      "attn_cached_values_1",
+      "attn_cached_positions_1",
+      "attn_offset_2",
+      "attn_cached_keys_2",
+      "attn_cached_values_2",
+      "attn_cached_positions_2",
+      "attn_offset_3",
+      "attn_cached_keys_3",
+      "attn_cached_values_3",
+      "attn_cached_positions_3",
+      "attn_offset_4",
+      "attn_cached_keys_4",
+      "attn_cached_values_4",
+      "attn_cached_positions_4",
+      "attn_offset_5",
+      "attn_cached_keys_5",
+      "attn_cached_values_5",
+      "attn_cached_positions_5",
+      "attn_offset_6",
+      "attn_cached_keys_6",
+      "attn_cached_values_6",
+      "attn_cached_positions_6",
+      "attn_offset_7",
+      "attn_cached_keys_7",
+      "attn_cached_values_7",
+      "attn_cached_positions_7",
+      "attn_offset_8",
+      "attn_cached_keys_8",
+      "attn_cached_values_8",
+      "attn_cached_positions_8",
+      "attn_offset_9",
+      "attn_cached_keys_9",
+      "attn_cached_values_9",
+      "attn_cached_positions_9",
+      "attn_offset_10",
+      "attn_cached_keys_10",
+      "attn_cached_values_10",
+      "attn_cached_positions_10",
+      "attn_offset_11",
+      "attn_cached_keys_11",
+      "attn_cached_values_11",
+      "attn_cached_positions_11"
+    ],
+    "decode_step_output_names": [
+      "audio",
+      "audio_lengths",
+      "transformer_offset_out_0",
+      "transformer_offset_out_1",
+      "transformer_offset_out_2",
+      "transformer_offset_out_3",
+      "attn_offset_out_0",
+      "attn_cached_keys_out_0",
+      "attn_cached_values_out_0",
+      "attn_cached_positions_out_0",
+      "attn_offset_out_1",
+      "attn_cached_keys_out_1",
+      "attn_cached_values_out_1",
+      "attn_cached_positions_out_1",
+      "attn_offset_out_2",
+      "attn_cached_keys_out_2",
+      "attn_cached_values_out_2",
+      "attn_cached_positions_out_2",
+      "attn_offset_out_3",
+      "attn_cached_keys_out_3",
+      "attn_cached_values_out_3",
+      "attn_cached_positions_out_3",
+      "attn_offset_out_4",
+      "attn_cached_keys_out_4",
+      "attn_cached_values_out_4",
+      "attn_cached_positions_out_4",
+      "attn_offset_out_5",
+      "attn_cached_keys_out_5",
+      "attn_cached_values_out_5",
+      "attn_cached_positions_out_5",
+      "attn_offset_out_6",
+      "attn_cached_keys_out_6",
+      "attn_cached_values_out_6",
+      "attn_cached_positions_out_6",
+      "attn_offset_out_7",
+      "attn_cached_keys_out_7",
+      "attn_cached_values_out_7",
+      "attn_cached_positions_out_7",
+      "attn_offset_out_8",
+      "attn_cached_keys_out_8",
+      "attn_cached_values_out_8",
+      "attn_cached_positions_out_8",
+      "attn_offset_out_9",
+      "attn_cached_keys_out_9",
+      "attn_cached_values_out_9",
+      "attn_cached_positions_out_9",
+      "attn_offset_out_10",
+      "attn_cached_keys_out_10",
+      "attn_cached_values_out_10",
+      "attn_cached_positions_out_10",
+      "attn_offset_out_11",
+      "attn_cached_keys_out_11",
+      "attn_cached_values_out_11",
+      "attn_cached_positions_out_11"
+    ]
+  },
+  "streaming_decode": {
+    "batch_size": 1,
+    "transformer_offsets": [
+      {
+        "index": 0,
+        "decoder_index": 1,
+        "input_name": "transformer_offset_0",
+        "output_name": "transformer_offset_out_0",
+        "shape": [
+          1
+        ],
+        "dtype": "int32"
+      },
+      {
+        "index": 1,
+        "decoder_index": 3,
+        "input_name": "transformer_offset_1",
+        "output_name": "transformer_offset_out_1",
+        "shape": [
+          1
+        ],
+        "dtype": "int32"
+      },
+      {
+        "index": 2,
+        "decoder_index": 5,
+        "input_name": "transformer_offset_2",
+        "output_name": "transformer_offset_out_2",
+        "shape": [
+          1
+        ],
+        "dtype": "int32"
+      },
+      {
+        "index": 3,
+        "decoder_index": 7,
+        "input_name": "transformer_offset_3",
+        "output_name": "transformer_offset_out_3",
+        "shape": [
+          1
+        ],
+        "dtype": "int32"
+      }
+    ],
+    "attention_caches": [
+      {
+        "index": 0,
+        "decoder_index": 1,
+        "layer_index": 0,
+        "context": 500,
+        "num_heads": 4,
+        "head_dim": 64,
+        "offset_input_name": "attn_offset_0",
+        "offset_output_name": "attn_offset_out_0",
+        "cached_keys_input_name": "attn_cached_keys_0",
+        "cached_keys_output_name": "attn_cached_keys_out_0",
+        "cached_values_input_name": "attn_cached_values_0",
+        "cached_values_output_name": "attn_cached_values_out_0",
+        "cached_positions_input_name": "attn_cached_positions_0",
+        "cached_positions_output_name": "attn_cached_positions_out_0",
+        "offset_shape": [
+          1
+        ],
+        "cache_shape": [
+          1,
+          4,
+          500,
+          64
+        ],
+        "positions_shape": [
+          1,
+          500
+        ],
+        "cache_dtype": "float32",
+        "positions_dtype": "int32"
+      },
+      {
+        "index": 1,
+        "decoder_index": 1,
+        "layer_index": 1,
+        "context": 500,
+        "num_heads": 4,
+        "head_dim": 64,
+        "offset_input_name": "attn_offset_1",
+        "offset_output_name": "attn_offset_out_1",
+        "cached_keys_input_name": "attn_cached_keys_1",
+        "cached_keys_output_name": "attn_cached_keys_out_1",
+        "cached_values_input_name": "attn_cached_values_1",
+        "cached_values_output_name": "attn_cached_values_out_1",
+        "cached_positions_input_name": "attn_cached_positions_1",
+        "cached_positions_output_name": "attn_cached_positions_out_1",
+        "offset_shape": [
+          1
+        ],
+        "cache_shape": [
+          1,
+          4,
+          500,
+          64
+        ],
+        "positions_shape": [
+          1,
+          500
+        ],
+        "cache_dtype": "float32",
+        "positions_dtype": "int32"
+      },
+      {
+        "index": 2,
+        "decoder_index": 1,
+        "layer_index": 2,
+        "context": 500,
+        "num_heads": 4,
+        "head_dim": 64,
+        "offset_input_name": "attn_offset_2",
+        "offset_output_name": "attn_offset_out_2",
+        "cached_keys_input_name": "attn_cached_keys_2",
+        "cached_keys_output_name": "attn_cached_keys_out_2",
+        "cached_values_input_name": "attn_cached_values_2",
+        "cached_values_output_name": "attn_cached_values_out_2",
+        "cached_positions_input_name": "attn_cached_positions_2",
+        "cached_positions_output_name": "attn_cached_positions_out_2",
+        "offset_shape": [
+          1
+        ],
+        "cache_shape": [
+          1,
+          4,
+          500,
+          64
+        ],
+        "positions_shape": [
+          1,
+          500
+        ],
+        "cache_dtype": "float32",
+        "positions_dtype": "int32"
+      },
+      {
+        "index": 3,
+        "decoder_index": 1,
+        "layer_index": 3,
+        "context": 500,
+        "num_heads": 4,
+        "head_dim": 64,
+        "offset_input_name": "attn_offset_3",
+        "offset_output_name": "attn_offset_out_3",
+        "cached_keys_input_name": "attn_cached_keys_3",
+        "cached_keys_output_name": "attn_cached_keys_out_3",
+        "cached_values_input_name": "attn_cached_values_3",
+        "cached_values_output_name": "attn_cached_values_out_3",
+        "cached_positions_input_name": "attn_cached_positions_3",
+        "cached_positions_output_name": "attn_cached_positions_out_3",
+        "offset_shape": [
+          1
+        ],
+        "cache_shape": [
+          1,
+          4,
+          500,
+          64
+        ],
+        "positions_shape": [
+          1,
+          500
+        ],
+        "cache_dtype": "float32",
+        "positions_dtype": "int32"
+      },
+      {
+        "index": 4,
+        "decoder_index": 3,
+        "layer_index": 0,
+        "context": 800,
+        "num_heads": 4,
+        "head_dim": 64,
+        "offset_input_name": "attn_offset_4",
+        "offset_output_name": "attn_offset_out_4",
+        "cached_keys_input_name": "attn_cached_keys_4",
+        "cached_keys_output_name": "attn_cached_keys_out_4",
+        "cached_values_input_name": "attn_cached_values_4",
+        "cached_values_output_name": "attn_cached_values_out_4",
+        "cached_positions_input_name": "attn_cached_positions_4",
+        "cached_positions_output_name": "attn_cached_positions_out_4",
+        "offset_shape": [
+          1
+        ],
+        "cache_shape": [
+          1,
+          4,
+          800,
+          64
+        ],
+        "positions_shape": [
+          1,
+          800
+        ],
+        "cache_dtype": "float32",
+        "positions_dtype": "int32"
+      },
+      {
+        "index": 5,
+        "decoder_index": 3,
+        "layer_index": 1,
+        "context": 800,
+        "num_heads": 4,
+        "head_dim": 64,
+        "offset_input_name": "attn_offset_5",
+        "offset_output_name": "attn_offset_out_5",
+        "cached_keys_input_name": "attn_cached_keys_5",
+        "cached_keys_output_name": "attn_cached_keys_out_5",
+        "cached_values_input_name": "attn_cached_values_5",
+        "cached_values_output_name": "attn_cached_values_out_5",
+        "cached_positions_input_name": "attn_cached_positions_5",
+        "cached_positions_output_name": "attn_cached_positions_out_5",
+        "offset_shape": [
+          1
+        ],
+        "cache_shape": [
+          1,
+          4,
+          800,
+          64
+        ],
+        "positions_shape": [
+          1,
+          800
+        ],
+        "cache_dtype": "float32",
+        "positions_dtype": "int32"
+      },
+      {
+        "index": 6,
+        "decoder_index": 5,
+        "layer_index": 0,
+        "context": 1200,
+        "num_heads": 4,
+        "head_dim": 64,
+        "offset_input_name": "attn_offset_6",
+        "offset_output_name": "attn_offset_out_6",
+        "cached_keys_input_name": "attn_cached_keys_6",
+        "cached_keys_output_name": "attn_cached_keys_out_6",
+        "cached_values_input_name": "attn_cached_values_6",
+        "cached_values_output_name": "attn_cached_values_out_6",
+        "cached_positions_input_name": "attn_cached_positions_6",
+        "cached_positions_output_name": "attn_cached_positions_out_6",
+        "offset_shape": [
+          1
+        ],
+        "cache_shape": [
+          1,
+          4,
+          1200,
+          64
+        ],
+        "positions_shape": [
+          1,
+          1200
+        ],
+        "cache_dtype": "float32",
+        "positions_dtype": "int32"
+      },
+      {
+        "index": 7,
+        "decoder_index": 5,
+        "layer_index": 1,
+        "context": 1200,
+        "num_heads": 4,
+        "head_dim": 64,
+        "offset_input_name": "attn_offset_7",
+        "offset_output_name": "attn_offset_out_7",
+        "cached_keys_input_name": "attn_cached_keys_7",
+        "cached_keys_output_name": "attn_cached_keys_out_7",
+        "cached_values_input_name": "attn_cached_values_7",
+        "cached_values_output_name": "attn_cached_values_out_7",
+        "cached_positions_input_name": "attn_cached_positions_7",
+        "cached_positions_output_name": "attn_cached_positions_out_7",
+        "offset_shape": [
+          1
+        ],
+        "cache_shape": [
+          1,
+          4,
+          1200,
+          64
+        ],
+        "positions_shape": [
+          1,
+          1200
+        ],
+        "cache_dtype": "float32",
+        "positions_dtype": "int32"
+      },
+      {
+        "index": 8,
+        "decoder_index": 7,
+        "layer_index": 0,
+        "context": 1600,
+        "num_heads": 4,
+        "head_dim": 64,
+        "offset_input_name": "attn_offset_8",
+        "offset_output_name": "attn_offset_out_8",
+        "cached_keys_input_name": "attn_cached_keys_8",
+        "cached_keys_output_name": "attn_cached_keys_out_8",
+        "cached_values_input_name": "attn_cached_values_8",
+        "cached_values_output_name": "attn_cached_values_out_8",
+        "cached_positions_input_name": "attn_cached_positions_8",
+        "cached_positions_output_name": "attn_cached_positions_out_8",
+        "offset_shape": [
+          1
+        ],
+        "cache_shape": [
+          1,
+          4,
+          1600,
+          64
+        ],
+        "positions_shape": [
+          1,
+          1600
+        ],
+        "cache_dtype": "float32",
+        "positions_dtype": "int32"
+      },
+      {
+        "index": 9,
+        "decoder_index": 7,
+        "layer_index": 1,
+        "context": 1600,
+        "num_heads": 4,
+        "head_dim": 64,
+        "offset_input_name": "attn_offset_9",
+        "offset_output_name": "attn_offset_out_9",
+        "cached_keys_input_name": "attn_cached_keys_9",
+        "cached_keys_output_name": "attn_cached_keys_out_9",
+        "cached_values_input_name": "attn_cached_values_9",
+        "cached_values_output_name": "attn_cached_values_out_9",
+        "cached_positions_input_name": "attn_cached_positions_9",
+        "cached_positions_output_name": "attn_cached_positions_out_9",
+        "offset_shape": [
+          1
+        ],
+        "cache_shape": [
+          1,
+          4,
+          1600,
+          64
+        ],
+        "positions_shape": [
+          1,
+          1600
+        ],
+        "cache_dtype": "float32",
+        "positions_dtype": "int32"
+      },
+      {
+        "index": 10,
+        "decoder_index": 7,
+        "layer_index": 2,
+        "context": 1600,
+        "num_heads": 4,
+        "head_dim": 64,
+        "offset_input_name": "attn_offset_10",
+        "offset_output_name": "attn_offset_out_10",
+        "cached_keys_input_name": "attn_cached_keys_10",
+        "cached_keys_output_name": "attn_cached_keys_out_10",
+        "cached_values_input_name": "attn_cached_values_10",
+        "cached_values_output_name": "attn_cached_values_out_10",
+        "cached_positions_input_name": "attn_cached_positions_10",
+        "cached_positions_output_name": "attn_cached_positions_out_10",
+        "offset_shape": [
+          1
+        ],
+        "cache_shape": [
+          1,
+          4,
+          1600,
+          64
+        ],
+        "positions_shape": [
+          1,
+          1600
+        ],
+        "cache_dtype": "float32",
+        "positions_dtype": "int32"
+      },
+      {
+        "index": 11,
+        "decoder_index": 7,
+        "layer_index": 3,
+        "context": 1600,
+        "num_heads": 4,
+        "head_dim": 64,
+        "offset_input_name": "attn_offset_11",
+        "offset_output_name": "attn_offset_out_11",
+        "cached_keys_input_name": "attn_cached_keys_11",
+        "cached_keys_output_name": "attn_cached_keys_out_11",
+        "cached_values_input_name": "attn_cached_values_11",
+        "cached_values_output_name": "attn_cached_values_out_11",
+        "cached_positions_input_name": "attn_cached_positions_11",
+        "cached_positions_output_name": "attn_cached_positions_out_11",
+        "offset_shape": [
+          1
+        ],
+        "cache_shape": [
+          1,
+          4,
+          1600,
+          64
+        ],
+        "positions_shape": [
+          1,
+          1600
+        ],
+        "cache_dtype": "float32",
+        "positions_dtype": "int32"
+      }
+    ]
+  }
+}

moss_audio_tokenizer_decode_full.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0fbbafe3fd4afa2a019af5c5ced204af6e2d1db044fa40f021525d2aee95b4ac
+size 681902

moss_audio_tokenizer_decode_shared.data ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e69d52e0f4e84ca27850557ee54face46632d3a5a16c89bd246c7c408466dcad
+size 44198912

moss_audio_tokenizer_decode_step.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9527c86a29e1837edec1f74db57d5eeaadb3a715af3382703566460afed25855
+size 351400

moss_audio_tokenizer_encode.data ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:aa751265b2bab2887eac224484546b194875aa7494b607115439b3dc6b228a2c
+size 44507136

moss_audio_tokenizer_encode.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:eadea4a645abdcf98714c7aead122ee2ce7da6e080f9f80b977cd1ca8e19473a
+size 815775