Image Feature Extraction
Transformers
Safetensors
moss-audio-tokenizer
audio
audio-tokenizer
neural-codec
moss-tts-family
MOSS Audio Tokenizer
speech-tokenizer
trust-remote-code
custom_code
Instructions to use OpenMOSS-Team/MOSS-Audio-Tokenizer-v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use OpenMOSS-Team/MOSS-Audio-Tokenizer-v2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-feature-extraction", model="OpenMOSS-Team/MOSS-Audio-Tokenizer-v2", trust_remote_code=True)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("OpenMOSS-Team/MOSS-Audio-Tokenizer-v2", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
Commit ·
38f484f
1
Parent(s): e9ada85
add
Browse files- .gitattributes +1 -0
- .gitignore +3 -0
- LICENSE +201 -0
- README.md +71 -17
- demo/demo_gt.wav +3 -0
- dev/huggingface_compliance_audit.md +215 -0
.gitattributes
CHANGED
|
@@ -43,5 +43,6 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
| 43 |
*.pkl filter=lfs diff=lfs merge=lfs -text
|
| 44 |
*.tar filter=lfs diff=lfs merge=lfs -text
|
| 45 |
*.wasm filter=lfs diff=lfs merge=lfs -text
|
|
|
|
| 46 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 47 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
| 43 |
*.pkl filter=lfs diff=lfs merge=lfs -text
|
| 44 |
*.tar filter=lfs diff=lfs merge=lfs -text
|
| 45 |
*.wasm filter=lfs diff=lfs merge=lfs -text
|
| 46 |
+
*.wav filter=lfs diff=lfs merge=lfs -text
|
| 47 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 48 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
.gitignore
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
dev/*
|
| 2 |
+
!dev/huggingface_compliance_audit.md
|
| 3 |
+
demo/demo_rec*.wav
|
LICENSE
ADDED
|
@@ -0,0 +1,201 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
Apache License
|
| 2 |
+
Version 2.0, January 2004
|
| 3 |
+
http://www.apache.org/licenses/
|
| 4 |
+
|
| 5 |
+
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
|
| 6 |
+
|
| 7 |
+
1. Definitions.
|
| 8 |
+
|
| 9 |
+
"License" shall mean the terms and conditions for use, reproduction,
|
| 10 |
+
and distribution as defined by Sections 1 through 9 of this document.
|
| 11 |
+
|
| 12 |
+
"Licensor" shall mean the copyright owner or entity authorized by
|
| 13 |
+
the copyright owner that is granting the License.
|
| 14 |
+
|
| 15 |
+
"Legal Entity" shall mean the union of the acting entity and all
|
| 16 |
+
other entities that control, are controlled by, or are under common
|
| 17 |
+
control with that entity. For the purposes of this definition,
|
| 18 |
+
"control" means (i) the power, direct or indirect, to cause the
|
| 19 |
+
direction or management of such entity, whether by contract or
|
| 20 |
+
otherwise, or (ii) ownership of fifty percent (50%) or more of the
|
| 21 |
+
outstanding shares, or (iii) beneficial ownership of such entity.
|
| 22 |
+
|
| 23 |
+
"You" (or "Your") shall mean an individual or Legal Entity
|
| 24 |
+
exercising permissions granted by this License.
|
| 25 |
+
|
| 26 |
+
"Source" form shall mean the preferred form for making modifications,
|
| 27 |
+
including but not limited to software source code, documentation
|
| 28 |
+
source, and configuration files.
|
| 29 |
+
|
| 30 |
+
"Object" form shall mean any form resulting from mechanical
|
| 31 |
+
transformation or translation of a Source form, including but
|
| 32 |
+
not limited to compiled object code, generated documentation,
|
| 33 |
+
and conversions to other media types.
|
| 34 |
+
|
| 35 |
+
"Work" shall mean the work of authorship, whether in Source or
|
| 36 |
+
Object form, made available under the License, as indicated by a
|
| 37 |
+
copyright notice that is included in or attached to the work
|
| 38 |
+
(an example is provided in the Appendix below).
|
| 39 |
+
|
| 40 |
+
"Derivative Works" shall mean any work, whether in Source or Object
|
| 41 |
+
form, that is based on (or derived from) the Work and for which the
|
| 42 |
+
editorial revisions, annotations, elaborations, or other modifications
|
| 43 |
+
represent, as a whole, an original work of authorship. For the purposes
|
| 44 |
+
of this License, Derivative Works shall not include works that remain
|
| 45 |
+
separable from, or merely link (or bind by name) to the interfaces of,
|
| 46 |
+
the Work and Derivative Works thereof.
|
| 47 |
+
|
| 48 |
+
"Contribution" shall mean any work of authorship, including
|
| 49 |
+
the original version of the Work and any modifications or additions
|
| 50 |
+
to that Work or Derivative Works thereof, that is intentionally
|
| 51 |
+
submitted to Licensor for inclusion in the Work by the copyright owner
|
| 52 |
+
or by an individual or Legal Entity authorized to submit on behalf of
|
| 53 |
+
the copyright owner. For the purposes of this definition, "submitted"
|
| 54 |
+
means any form of electronic, verbal, or written communication sent
|
| 55 |
+
to the Licensor or its representatives, including but not limited to
|
| 56 |
+
communication on electronic mailing lists, source code control systems,
|
| 57 |
+
and issue tracking systems that are managed by, or on behalf of, the
|
| 58 |
+
Licensor for the purpose of discussing and improving the Work, but
|
| 59 |
+
excluding communication that is conspicuously marked or otherwise
|
| 60 |
+
designated in writing by the copyright owner as "Not a Contribution."
|
| 61 |
+
|
| 62 |
+
"Contributor" shall mean Licensor and any individual or Legal Entity
|
| 63 |
+
on behalf of whom a Contribution has been received by Licensor and
|
| 64 |
+
subsequently incorporated within the Work.
|
| 65 |
+
|
| 66 |
+
2. Grant of Copyright License. Subject to the terms and conditions of
|
| 67 |
+
this License, each Contributor hereby grants to You a perpetual,
|
| 68 |
+
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
|
| 69 |
+
copyright license to reproduce, prepare Derivative Works of,
|
| 70 |
+
publicly display, publicly perform, sublicense, and distribute the
|
| 71 |
+
Work and such Derivative Works in Source or Object form.
|
| 72 |
+
|
| 73 |
+
3. Grant of Patent License. Subject to the terms and conditions of
|
| 74 |
+
this License, each Contributor hereby grants to You a perpetual,
|
| 75 |
+
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
|
| 76 |
+
(except as stated in this section) patent license to make, have made,
|
| 77 |
+
use, offer to sell, sell, import, and otherwise transfer the Work,
|
| 78 |
+
where such license applies only to those patent claims licensable
|
| 79 |
+
by such Contributor that are necessarily infringed by their
|
| 80 |
+
Contribution(s) alone or by combination of their Contribution(s)
|
| 81 |
+
with the Work to which such Contribution(s) was submitted. If You
|
| 82 |
+
institute patent litigation against any entity (including a
|
| 83 |
+
cross-claim or counterclaim in a lawsuit) alleging that the Work
|
| 84 |
+
or a Contribution incorporated within the Work constitutes direct
|
| 85 |
+
or contributory patent infringement, then any patent licenses
|
| 86 |
+
granted to You under this License for that Work shall terminate
|
| 87 |
+
as of the date such litigation is filed.
|
| 88 |
+
|
| 89 |
+
4. Redistribution. You may reproduce and distribute copies of the
|
| 90 |
+
Work or Derivative Works thereof in any medium, with or without
|
| 91 |
+
modifications, and in Source or Object form, provided that You
|
| 92 |
+
meet the following conditions:
|
| 93 |
+
|
| 94 |
+
(a) You must give any other recipients of the Work or
|
| 95 |
+
Derivative Works a copy of this License; and
|
| 96 |
+
|
| 97 |
+
(b) You must cause any modified files to carry prominent notices
|
| 98 |
+
stating that You changed the files; and
|
| 99 |
+
|
| 100 |
+
(c) You must retain, in the Source form of any Derivative Works
|
| 101 |
+
that You distribute, all copyright, patent, trademark, and
|
| 102 |
+
attribution notices from the Source form of the Work,
|
| 103 |
+
excluding those notices that do not pertain to any part of
|
| 104 |
+
the Derivative Works; and
|
| 105 |
+
|
| 106 |
+
(d) If the Work includes a "NOTICE" text file as part of its
|
| 107 |
+
distribution, then any Derivative Works that You distribute must
|
| 108 |
+
include a readable copy of the attribution notices contained
|
| 109 |
+
within such NOTICE file, excluding those notices that do not
|
| 110 |
+
pertain to any part of the Derivative Works, in at least one
|
| 111 |
+
of the following places: within a NOTICE text file distributed
|
| 112 |
+
as part of the Derivative Works; within the Source form or
|
| 113 |
+
documentation, if provided along with the Derivative Works; or,
|
| 114 |
+
within a display generated by the Derivative Works, if and
|
| 115 |
+
wherever such third-party notices normally appear. The contents
|
| 116 |
+
of the NOTICE file are for informational purposes only and
|
| 117 |
+
do not modify the License. You may add Your own attribution
|
| 118 |
+
notices within Derivative Works that You distribute, alongside
|
| 119 |
+
or as an addendum to the NOTICE text from the Work, provided
|
| 120 |
+
that such additional attribution notices cannot be construed
|
| 121 |
+
as modifying the License.
|
| 122 |
+
|
| 123 |
+
You may add Your own copyright statement to Your modifications and
|
| 124 |
+
may provide additional or different license terms and conditions
|
| 125 |
+
for use, reproduction, or distribution of Your modifications, or
|
| 126 |
+
for any such Derivative Works as a whole, provided Your use,
|
| 127 |
+
reproduction, and distribution of the Work otherwise complies with
|
| 128 |
+
the conditions stated in this License.
|
| 129 |
+
|
| 130 |
+
5. Submission of Contributions. Unless You explicitly state otherwise,
|
| 131 |
+
any Contribution intentionally submitted for inclusion in the Work
|
| 132 |
+
by You to the Licensor shall be under the terms and conditions of
|
| 133 |
+
this License, without any additional terms or conditions.
|
| 134 |
+
Notwithstanding the above, nothing herein shall supersede or modify
|
| 135 |
+
the terms of any separate license agreement you may have executed
|
| 136 |
+
with Licensor regarding such Contributions.
|
| 137 |
+
|
| 138 |
+
6. Trademarks. This License does not grant permission to use the trade
|
| 139 |
+
names, trademarks, service marks, or product names of the Licensor,
|
| 140 |
+
except as required for reasonable and customary use in describing the
|
| 141 |
+
origin of the Work and reproducing the content of the NOTICE file.
|
| 142 |
+
|
| 143 |
+
7. Disclaimer of Warranty. Unless required by applicable law or
|
| 144 |
+
agreed to in writing, Licensor provides the Work (and each
|
| 145 |
+
Contributor provides its Contributions) on an "AS IS" BASIS,
|
| 146 |
+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
|
| 147 |
+
implied, including, without limitation, any warranties or conditions
|
| 148 |
+
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
|
| 149 |
+
PARTICULAR PURPOSE. You are solely responsible for determining the
|
| 150 |
+
appropriateness of using or redistributing the Work and assume any
|
| 151 |
+
risks associated with Your exercise of permissions under this License.
|
| 152 |
+
|
| 153 |
+
8. Limitation of Liability. In no event and under no legal theory,
|
| 154 |
+
whether in tort (including negligence), contract, or otherwise,
|
| 155 |
+
unless required by applicable law (such as deliberate and grossly
|
| 156 |
+
negligent acts) or agreed to in writing, shall any Contributor be
|
| 157 |
+
liable to You for damages, including any direct, indirect, special,
|
| 158 |
+
incidental, or consequential damages of any character arising as a
|
| 159 |
+
result of this License or out of the use or inability to use the
|
| 160 |
+
Work (including but not limited to damages for loss of goodwill,
|
| 161 |
+
work stoppage, computer failure or malfunction, or any and all
|
| 162 |
+
other commercial damages or losses), even if such Contributor
|
| 163 |
+
has been advised of the possibility of such damages.
|
| 164 |
+
|
| 165 |
+
9. Accepting Warranty or Additional Liability. While redistributing
|
| 166 |
+
the Work or Derivative Works thereof, You may choose to offer,
|
| 167 |
+
and charge a fee for, acceptance of support, warranty, indemnity,
|
| 168 |
+
or other liability obligations and/or rights consistent with this
|
| 169 |
+
License. However, in accepting such obligations, You may act only
|
| 170 |
+
on Your own behalf and on Your sole responsibility, not on behalf
|
| 171 |
+
of any other Contributor, and only if You agree to indemnify,
|
| 172 |
+
defend, and hold each Contributor harmless for any liability
|
| 173 |
+
incurred by, or claims asserted against, such Contributor by reason
|
| 174 |
+
of your accepting any such warranty or additional liability.
|
| 175 |
+
|
| 176 |
+
END OF TERMS AND CONDITIONS
|
| 177 |
+
|
| 178 |
+
APPENDIX: How to apply the Apache License to your work.
|
| 179 |
+
|
| 180 |
+
To apply the Apache License to your work, attach the following
|
| 181 |
+
boilerplate notice, with the fields enclosed by brackets "[]"
|
| 182 |
+
replaced with your own identifying information. (Don't include
|
| 183 |
+
the brackets!) The text should be enclosed in the appropriate
|
| 184 |
+
comment syntax for the file format. We also recommend that a
|
| 185 |
+
file or class name and description of purpose be included on the
|
| 186 |
+
same "printed page" as the copyright notice for easier
|
| 187 |
+
identification within third-party archives.
|
| 188 |
+
|
| 189 |
+
Copyright [yyyy] [name of copyright owner]
|
| 190 |
+
|
| 191 |
+
Licensed under the Apache License, Version 2.0 (the "License");
|
| 192 |
+
you may not use this file except in compliance with the License.
|
| 193 |
+
You may obtain a copy of the License at
|
| 194 |
+
|
| 195 |
+
http://www.apache.org/licenses/LICENSE-2.0
|
| 196 |
+
|
| 197 |
+
Unless required by applicable law or agreed to in writing, software
|
| 198 |
+
distributed under the License is distributed on an "AS IS" BASIS,
|
| 199 |
+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
| 200 |
+
See the License for the specific language governing permissions and
|
| 201 |
+
limitations under the License.
|
README.md
CHANGED
|
@@ -9,6 +9,7 @@ tags:
|
|
| 9 |
- MOSS Audio Tokenizer
|
| 10 |
- speech-tokenizer
|
| 11 |
- trust-remote-code
|
|
|
|
| 12 |
---
|
| 13 |
|
| 14 |
# Moss-Audio-Tokenizer-V2
|
|
@@ -30,8 +31,58 @@ This is the code for the 48khz stereo version of MOSS-Audio-Tokenizer presented
|
|
| 30 |
By combining a simple, scalable architecture with massive-scale data, the Cat architecture overcomes the bottlenecks of traditional audio tokenizers. It provides a robust, high-fidelity, and semantically grounded interface for the next generation of native audio foundation models.
|
| 31 |
|
| 32 |
This repository contains a lightweight remote-code implementation that mirrors the current 🤗 Transformers
|
| 33 |
-
`transformers.models.moss_audio_tokenizer` module. It is
|
| 34 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 35 |
|
| 36 |
## Usage
|
| 37 |
|
|
@@ -45,7 +96,8 @@ import torchaudio
|
|
| 45 |
repo_id = "OpenMOSS-Team/MOSS-Audio-Tokenizer-V2"
|
| 46 |
model = AutoModel.from_pretrained(repo_id, trust_remote_code=True).eval()
|
| 47 |
|
| 48 |
-
|
|
|
|
| 49 |
if sr != model.sampling_rate:
|
| 50 |
wav = torchaudio.functional.resample(wav, sr, model.sampling_rate)
|
| 51 |
if wav.shape[0] == 1:
|
|
@@ -66,6 +118,8 @@ wav_rvq8 = dec_rvq8.audio.squeeze(0)
|
|
| 66 |
torchaudio.save("demo/demo_rec_rvq8.wav", wav_rvq8, sample_rate=model.sampling_rate)
|
| 67 |
```
|
| 68 |
|
|
|
|
|
|
|
| 69 |
### Attention Backend And Compute Dtype
|
| 70 |
|
| 71 |
`config.attention_implementation` controls whether transformer layers prefer `sdpa` or `flash_attention_2`.
|
|
@@ -114,25 +168,25 @@ batch_dec = model.batch_decode(codes_list, chunk_duration=0.08)
|
|
| 114 |
- `modeling_moss_audio_tokenizer.py`
|
| 115 |
- `__init__.py`
|
| 116 |
- `config.json`
|
| 117 |
-
- model
|
| 118 |
-
|
| 119 |
-
|
|
|
|
| 120 |
|
| 121 |
## Citation
|
| 122 |
If you use this code or result in your paper, please cite our work as:
|
| 123 |
```tex
|
| 124 |
-
@misc{gong2026mossaudiotokenizerscaling,
|
| 125 |
-
title={MOSS-Audio-Tokenizer: Scaling Audio Tokenizers for Future Audio Foundation Models},
|
| 126 |
-
author={Yitian Gong and Kuangwei Chen and Zhaoye Fei and Xiaogui Yang and Ke Chen and Yang Wang and Kexin Huang and Mingshu Chen and Ruixiao Li and Qingyuan Cheng and Shimin Li and Xipeng Qiu},
|
| 127 |
-
year={2026},
|
| 128 |
-
eprint={2602.10934},
|
| 129 |
-
archivePrefix={arXiv},
|
| 130 |
-
primaryClass={cs.SD},
|
| 131 |
-
url={https://arxiv.org/abs/2602.10934}
|
| 132 |
}
|
| 133 |
```
|
| 134 |
|
| 135 |
-
## License
|
| 136 |
-
|
| 137 |
-
MOSS-Audio-Tokenizer-V2 is released under the Apache 2.0 license.
|
| 138 |
|
|
|
|
| 9 |
- MOSS Audio Tokenizer
|
| 10 |
- speech-tokenizer
|
| 11 |
- trust-remote-code
|
| 12 |
+
- arxiv:2602.10934
|
| 13 |
---
|
| 14 |
|
| 15 |
# Moss-Audio-Tokenizer-V2
|
|
|
|
| 31 |
By combining a simple, scalable architecture with massive-scale data, the Cat architecture overcomes the bottlenecks of traditional audio tokenizers. It provides a robust, high-fidelity, and semantically grounded interface for the next generation of native audio foundation models.
|
| 32 |
|
| 33 |
This repository contains a lightweight remote-code implementation that mirrors the current 🤗 Transformers
|
| 34 |
+
`transformers.models.moss_audio_tokenizer` module. It is hosted as a Hugging Face Hub model repository and should be
|
| 35 |
+
loaded with `trust_remote_code=True`.
|
| 36 |
+
|
| 37 |
+
## Model Details
|
| 38 |
+
|
| 39 |
+
- **Architecture:** Cat (Causal Audio Tokenizer with Transformer), a CNN-free neural audio codec/tokenizer.
|
| 40 |
+
- **Sampling rate:** 48 kHz.
|
| 41 |
+
- **Channels:** stereo public waveform interface.
|
| 42 |
+
- **Token frame rate:** 12.5 Hz.
|
| 43 |
+
- **Quantization:** 32-layer residual vector quantization stack.
|
| 44 |
+
- **Checkpoint size:** the safetensors index reports 2,123,701,248 total parameters.
|
| 45 |
+
- **Weight format:** sharded `safetensors` weights with a `model.safetensors.index.json` index.
|
| 46 |
+
|
| 47 |
+
## Intended Use
|
| 48 |
+
|
| 49 |
+
MOSS-Audio-Tokenizer-V2 is intended for research and development on audio tokenization, neural codec reconstruction,
|
| 50 |
+
native audio foundation models, speech/audio understanding, speech generation, and related downstream modeling. It can
|
| 51 |
+
encode 48 kHz stereo waveforms into discrete audio codes and decode those codes back to waveforms.
|
| 52 |
+
|
| 53 |
+
This model is not intended for use in applications that impersonate a real person, reproduce private or copyrighted
|
| 54 |
+
audio without permission, or make high-stakes decisions from reconstructed audio without additional validation.
|
| 55 |
+
|
| 56 |
+
## Training Data And Procedure
|
| 57 |
+
|
| 58 |
+
The model was trained from scratch on 3 million hours of diverse audio data, covering speech, sound effects, and music,
|
| 59 |
+
as described in the accompanying paper. The training pipeline jointly optimizes the encoder, quantizer, decoder,
|
| 60 |
+
discriminator, and a decoder-only LLM used for semantic alignment.
|
| 61 |
+
|
| 62 |
+
The full training data mixture is not included in this repository. For details on dataset composition, filtering, and
|
| 63 |
+
training/evaluation methodology, refer to the paper.
|
| 64 |
+
|
| 65 |
+
## Evaluation
|
| 66 |
+
|
| 67 |
+
The model is designed to provide high-fidelity reconstruction and semantically rich discrete representations across
|
| 68 |
+
speech, sound effects, and music. Please refer to the paper for the full benchmark setup and quantitative results.
|
| 69 |
+
|
| 70 |
+
## Limitations
|
| 71 |
+
|
| 72 |
+
- Audio outside the 48 kHz stereo setting may require resampling and channel conversion before inference.
|
| 73 |
+
- Reconstruction quality depends on audio domain, signal quality, selected number of RVQ layers, and inference settings.
|
| 74 |
+
- The repository uses custom Transformers remote code, so users should review the code and pin a trusted revision in
|
| 75 |
+
production deployments.
|
| 76 |
+
- `flash_attention_2` is optional; if it is unavailable, use the default `sdpa` attention implementation.
|
| 77 |
+
|
| 78 |
+
## Requirements
|
| 79 |
+
|
| 80 |
+
- Python 3.10 or newer.
|
| 81 |
+
- PyTorch.
|
| 82 |
+
- Transformers. This checkpoint was prepared with `transformers_version` set to `4.56.0.dev0`; use a recent Transformers
|
| 83 |
+
build that supports custom remote-code models.
|
| 84 |
+
- `torchaudio` for the examples below.
|
| 85 |
+
- Optional: `flash-attn` if using `model.set_attention_implementation("flash_attention_2")`.
|
| 86 |
|
| 87 |
## Usage
|
| 88 |
|
|
|
|
| 96 |
repo_id = "OpenMOSS-Team/MOSS-Audio-Tokenizer-V2"
|
| 97 |
model = AutoModel.from_pretrained(repo_id, trust_remote_code=True).eval()
|
| 98 |
|
| 99 |
+
audio_path = "demo/demo_gt.wav" # replace with your own 48 kHz stereo audio path if needed
|
| 100 |
+
wav, sr = torchaudio.load(audio_path)
|
| 101 |
if sr != model.sampling_rate:
|
| 102 |
wav = torchaudio.functional.resample(wav, sr, model.sampling_rate)
|
| 103 |
if wav.shape[0] == 1:
|
|
|
|
| 118 |
torchaudio.save("demo/demo_rec_rvq8.wav", wav_rvq8, sample_rate=model.sampling_rate)
|
| 119 |
```
|
| 120 |
|
| 121 |
+
For production use with `trust_remote_code=True`, pin `revision` to a reviewed commit hash.
|
| 122 |
+
|
| 123 |
### Attention Backend And Compute Dtype
|
| 124 |
|
| 125 |
`config.attention_implementation` controls whether transformer layers prefer `sdpa` or `flash_attention_2`.
|
|
|
|
| 168 |
- `modeling_moss_audio_tokenizer.py`
|
| 169 |
- `__init__.py`
|
| 170 |
- `config.json`
|
| 171 |
+
- `model.safetensors.index.json`
|
| 172 |
+
- sharded model weights: `model-00001-of-00003.safetensors`, `model-00002-of-00003.safetensors`,
|
| 173 |
+
`model-00003-of-00003.safetensors`
|
| 174 |
+
- `demo/demo_gt.wav`
|
| 175 |
|
| 176 |
## Citation
|
| 177 |
If you use this code or result in your paper, please cite our work as:
|
| 178 |
```tex
|
| 179 |
+
@misc{gong2026mossaudiotokenizerscaling,
|
| 180 |
+
title={MOSS-Audio-Tokenizer: Scaling Audio Tokenizers for Future Audio Foundation Models},
|
| 181 |
+
author={Yitian Gong and Kuangwei Chen and Zhaoye Fei and Xiaogui Yang and Ke Chen and Yang Wang and Kexin Huang and Mingshu Chen and Ruixiao Li and Qingyuan Cheng and Shimin Li and Xipeng Qiu},
|
| 182 |
+
year={2026},
|
| 183 |
+
eprint={2602.10934},
|
| 184 |
+
archivePrefix={arXiv},
|
| 185 |
+
primaryClass={cs.SD},
|
| 186 |
+
url={https://arxiv.org/abs/2602.10934}
|
| 187 |
}
|
| 188 |
```
|
| 189 |
|
| 190 |
+
## License
|
| 191 |
+
MOSS-Audio-Tokenizer-V2 is released under the Apache 2.0 license. See `LICENSE` for the full license text.
|
|
|
|
| 192 |
|
demo/demo_gt.wav
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:631608f5c8b931ece1d45adc7f40a3b3b0ae2ec056a8a08a3565b04cc5750a4b
|
| 3 |
+
size 243244
|
dev/huggingface_compliance_audit.md
ADDED
|
@@ -0,0 +1,215 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Hugging Face 合规性检查总结
|
| 2 |
+
|
| 3 |
+
检查日期:2026-06-05 UTC
|
| 4 |
+
|
| 5 |
+
检查对象:本地仓库 `MOSS-Audio-Tokenizer-V2`,远程地址为 `https://huggingface.co/OpenMOSS-Team/MOSS-Audio-Tokenizer-V2`。
|
| 6 |
+
|
| 7 |
+
结论:本地仓库整体已经是标准 Hugging Face Transformers 自定义模型仓库形态,权重使用 safetensors + Git LFS 分片,`config.json` 的 `auto_map` 和远程代码加载入口也能通过本地验证。没有发现会直接阻断上传或 `AutoConfig`/模型类动态导入的问题。2026-06-05 已完成一轮整改,补齐了主要发布质量问题;剩余事项主要是远程页面状态、可选 metadata 和更完整的结构化评测信息核验。
|
| 8 |
+
|
| 9 |
+
## 整改状态
|
| 10 |
+
|
| 11 |
+
已完成:
|
| 12 |
+
|
| 13 |
+
- 添加根目录 `LICENSE`,使用 Apache-2.0 全文。
|
| 14 |
+
- 删除 README license 部分的待办注释。
|
| 15 |
+
- README 补充 Model Details、Intended Use、Training Data And Procedure、Evaluation、Limitations、Requirements。
|
| 16 |
+
- README 示例保留 `demo/demo_gt.wav`,并说明可替换为自有音频;当前本地已有 `demo/demo_gt.wav`。
|
| 17 |
+
- README 增加 production 使用 `trust_remote_code=True` 时 pin commit revision 的建议。
|
| 18 |
+
- README repository layout 补充分片权重、index 和 demo 音频。
|
| 19 |
+
- `.gitignore` 增加 `!dev/huggingface_compliance_audit.md`,使本审计文档可被 Git 跟踪;同时忽略 quickstart 生成的 `demo/demo_rec*.wav`。
|
| 20 |
+
|
| 21 |
+
## 检查依据
|
| 22 |
+
|
| 23 |
+
本次按 Hugging Face 官方文档的以下方向检查:
|
| 24 |
+
|
| 25 |
+
- Model Cards: `README.md` 是模型卡,应包含 YAML metadata 和正文说明;模型卡应描述模型、用途/限制、训练信息、数据、评测结果等。参考:<https://huggingface.co/docs/hub/model-cards>
|
| 26 |
+
- Model Card metadata: 建议显式写 `library_name`;可写 `pipeline_tag`、`license`、`datasets`、`model-index` 等以提高可发现性。参考:<https://huggingface.co/docs/hub/model-cards>
|
| 27 |
+
- Custom Transformers model: 自定义模型需要配置类 `model_type`、模型类 `config_class`、`auto_map`,并通过 `trust_remote_code=True` 加载。参考:<https://huggingface.co/docs/transformers/custom_models>
|
| 28 |
+
- 大文件和仓库结构:建议文件数少于 100k、单目录条目少于 10k、单文件分片小于 200GB、一次 commit 尽量少于 100 个大文件操作。参考:<https://huggingface.co/docs/hub/en/storage-limits>
|
| 29 |
+
- 权重安全格式:safetensors 相比 pickle 更安全;pickle 权重存在任意代码执行风险。参考:<https://huggingface.co/docs/safetensors/en/index>、<https://huggingface.co/docs/hub/security-pickle>
|
| 30 |
+
|
| 31 |
+
## 本地证据
|
| 32 |
+
|
| 33 |
+
### 仓库结构
|
| 34 |
+
|
| 35 |
+
`git ls-files` 当前跟踪的文件为:
|
| 36 |
+
|
| 37 |
+
- `.gitattributes`
|
| 38 |
+
- `README.md`
|
| 39 |
+
- `__init__.py`
|
| 40 |
+
- `config.json`
|
| 41 |
+
- `configuration_moss_audio_tokenizer.py`
|
| 42 |
+
- `modeling_moss_audio_tokenizer.py`
|
| 43 |
+
- `model-00001-of-00003.safetensors`
|
| 44 |
+
- `model-00002-of-00003.safetensors`
|
| 45 |
+
- `model-00003-of-00003.safetensors`
|
| 46 |
+
- `model.safetensors.index.json`
|
| 47 |
+
|
| 48 |
+
结构判断:符合 Hugging Face model repo 的基本结构。自定义 Transformers remote-code 文件、config、模型权重和权重索引均在根目录,用户可以通过 `AutoModel.from_pretrained(repo_id, trust_remote_code=True)` 获取。
|
| 49 |
+
|
| 50 |
+
注意:当前 `.gitignore` 保留 `dev/*`,但已增加 `!dev/huggingface_compliance_audit.md`,因此本文件可以被 Git 跟踪。`.gitignore` 还忽略 quickstart 生成的 `demo/demo_rec*.wav`。
|
| 51 |
+
|
| 52 |
+
### 模型卡 README
|
| 53 |
+
|
| 54 |
+
已符合:
|
| 55 |
+
|
| 56 |
+
- `README.md` 顶部已有 YAML metadata。
|
| 57 |
+
- 已声明 `license: apache-2.0`。
|
| 58 |
+
- 已声明 `library_name: transformers`。这点很重要,因为 Hugging Face 对 2024-08 之后创建的模型仓库不再总是从 `config.json` 自动推断为 Transformers。
|
| 59 |
+
- 已提供与 audio tokenizer 相关的 tags:`audio`、`audio-tokenizer`、`neural-codec`、`moss-tts-family`、`speech-tokenizer`、`trust-remote-code` 等。
|
| 60 |
+
- 已包含 quickstart、streaming 使用示例、RVQ 层数控制、citation 和 license 说明。
|
| 61 |
+
- README 示例明确使用 `trust_remote_code=True`,与 custom model 要求一致。
|
| 62 |
+
|
| 63 |
+
仍可改进或待核验:
|
| 64 |
+
|
| 65 |
+
- YAML metadata 仍未添加 `pipeline_tag`。音频 tokenizer 不一定有完全匹配的官方 pipeline;如果 Hugging Face metadata UI 接受,可以考虑 `feature-extraction`,否则保留现有自定义 tags。
|
| 66 |
+
- YAML metadata 没有 `datasets`。README 已补训练数据说明;如果训练数据有公开 Hub dataset id,可再补 `datasets` metadata。
|
| 67 |
+
- 没有 `model-index` 或结构化 eval metadata。若论文中有重建质量、ASR/TTS 下游指标,建议加入正文表格;如果有可结构化指标,再加 `model-index`。
|
| 68 |
+
- 远程页面 metadata 是否正确渲染仍需有权限账号确认。
|
| 69 |
+
|
| 70 |
+
### config 和 AutoClass
|
| 71 |
+
|
| 72 |
+
已符合:
|
| 73 |
+
|
| 74 |
+
- `config.json` 包含:
|
| 75 |
+
- `model_type: "moss-audio-tokenizer"`
|
| 76 |
+
- `architectures: ["MossAudioTokenizerModel"]`
|
| 77 |
+
- `auto_map.AutoConfig: "configuration_moss_audio_tokenizer.MossAudioTokenizerConfig"`
|
| 78 |
+
- `auto_map.AutoModel: "modeling_moss_audio_tokenizer.MossAudioTokenizerModel"`
|
| 79 |
+
- `configuration_moss_audio_tokenizer.py` 中 `MossAudioTokenizerConfig.model_type` 与 `config.json` 一致。
|
| 80 |
+
- `modeling_moss_audio_tokenizer.py` 中 `MossAudioTokenizerPreTrainedModel.config_class = MossAudioTokenizerConfig`。
|
| 81 |
+
- `modeling_moss_audio_tokenizer.py` 设置了 `main_input_name = "input_values"`、`input_modalities = "audio"` 和 `_no_split_modules`,对 Transformers 加载/设备切分是正面信号。
|
| 82 |
+
|
| 83 |
+
本地验证通过:
|
| 84 |
+
|
| 85 |
+
```bash
|
| 86 |
+
python -c "from transformers import AutoConfig; c=AutoConfig.from_pretrained('.', trust_remote_code=True); print(type(c).__name__, c.model_type, c.architectures, c.auto_map)"
|
| 87 |
+
```
|
| 88 |
+
|
| 89 |
+
输出要点:
|
| 90 |
+
|
| 91 |
+
```text
|
| 92 |
+
MossAudioTokenizerConfig moss-audio-tokenizer ['MossAudioTokenizerModel'] {'AutoConfig': 'configuration_moss_audio_tokenizer.MossAudioTokenizerConfig', 'AutoModel': 'modeling_moss_audio_tokenizer.MossAudioTokenizerModel'}
|
| 93 |
+
```
|
| 94 |
+
|
| 95 |
+
模型类动态导入也通过:
|
| 96 |
+
|
| 97 |
+
```bash
|
| 98 |
+
python -c "from transformers.dynamic_module_utils import get_class_from_dynamic_module; cls=get_class_from_dynamic_module('modeling_moss_audio_tokenizer.MossAudioTokenizerModel', '.'); print(cls.__name__, cls.config_class.__name__)"
|
| 99 |
+
```
|
| 100 |
+
|
| 101 |
+
输出要点:
|
| 102 |
+
|
| 103 |
+
```text
|
| 104 |
+
MossAudioTokenizerModel MossAudioTokenizerConfig
|
| 105 |
+
```
|
| 106 |
+
|
| 107 |
+
当前状态:
|
| 108 |
+
|
| 109 |
+
- README 已补 “Requirements” 小节,说明 Python、PyTorch、Transformers、`torchaudio` 和可选 `flash-attn`。
|
| 110 |
+
- README 已说明 `config.json` 中的 `transformers_version` 为 `4.56.0.dev0`,建议使用支持 custom remote-code models 的近期 Transformers build。
|
| 111 |
+
- README 已加入 production 使用 `trust_remote_code=True` 时 pin reviewed commit hash 的建议。
|
| 112 |
+
|
| 113 |
+
### 权重、LFS 和仓库大小
|
| 114 |
+
|
| 115 |
+
已符合:
|
| 116 |
+
|
| 117 |
+
- `.gitattributes` 对 `*.safetensors` 设置了 `filter=lfs diff=lfs merge=lfs -text`。
|
| 118 |
+
- `git lfs ls-files` 列出了三个 safetensors 分片:
|
| 119 |
+
- `model-00001-of-00003.safetensors`
|
| 120 |
+
- `model-00002-of-00003.safetensors`
|
| 121 |
+
- `model-00003-of-00003.safetensors`
|
| 122 |
+
- `git cat-file -s HEAD:model-00001-of-00003.safetensors` 为 135 字节,说明 Git 对象里是 LFS pointer,不是把 3.9GB 权重作为普通 Git blob 提交。
|
| 123 |
+
- 第一个 LFS pointer 内容记录了真实大小 `3978639168` 字节。
|
| 124 |
+
- 另外两个分片的 Git blob size 分别为 135 和 134 字节,也符合 LFS pointer 预期。
|
| 125 |
+
- `model.safetensors.index.json` metadata:
|
| 126 |
+
- `total_parameters: 2123701248`
|
| 127 |
+
- `total_size: 8494804992`
|
| 128 |
+
- `weight_map` 条目数:2094
|
| 129 |
+
- `weight_map` 分片分布:
|
| 130 |
+
- `model-00001-of-00003.safetensors`: 898 entries
|
| 131 |
+
- `model-00002-of-00003.safetensors`: 1010 entries
|
| 132 |
+
- `model-00003-of-00003.safetensors`: 186 entries
|
| 133 |
+
- 三个分片大小约 3.98GB、3.99GB、0.52GB,远低于 Hugging Face 对大文件建议的 200GB 分片线,也低于 500GB 单文件硬限制。
|
| 134 |
+
- 仓库跟踪文件数量只有 10 个,远低于 100k 文件建议,也不存在单目录 10k 条目问题。
|
| 135 |
+
|
| 136 |
+
需要改进或待核验:
|
| 137 |
+
|
| 138 |
+
- 只能确认本地当前分支的 LFS pointer。远程仓库历史中是否有旧的大 LFS 版本、未清理 PR ref、重复上传,需要在 Hugging Face repo Settings 的 “List LFS files” 或通过有权限的 API 再查。
|
| 139 |
+
- 如果后续更新权重,建议保持分片数量少、每次 commit 的大文件操作不超过 50-100 个。
|
| 140 |
+
|
| 141 |
+
### 安全性
|
| 142 |
+
|
| 143 |
+
已符合:
|
| 144 |
+
|
| 145 |
+
- 权重使用 safetensors,没有发现 `.bin`、`.pt`、`.pth`、`.pkl`、`.pickle` 等 pickle 风险权重文件。
|
| 146 |
+
- 自定义代码 import 扫描未发现明显高风险模式:`subprocess`、`requests`、`urllib`、`socket`、`pickle`、`torch.load`、`eval(`、`exec(` 等。
|
| 147 |
+
- `flash_attn` 是 try/except 可选依赖;缺失时会回退,不会阻断基础导入。
|
| 148 |
+
|
| 149 |
+
当前状态:
|
| 150 |
+
|
| 151 |
+
- README 已说明 repo 使用 custom Transformers remote code,并建议生产环境 pin reviewed commit hash。
|
| 152 |
+
- 如果未来增加依赖,仍应避免在模型 import 或 forward 过程中做网络访问、文件系统副作用、shell 调用。
|
| 153 |
+
|
| 154 |
+
### 远程页面状态
|
| 155 |
+
|
| 156 |
+
待核验:
|
| 157 |
+
|
| 158 |
+
- 浏览器/匿名 API 访问 `https://huggingface.co/OpenMOSS-Team/MOSS-Audio-Tokenizer-V2` 返回 401,当前无法匿名确认远程页面实际渲染、文件列表、模型卡 metadata 解析结果、LFS 文件列表或下载统计。
|
| 159 |
+
- 本地 `git remote -v` 确认 origin 指向该 Hugging Face repo。
|
| 160 |
+
|
| 161 |
+
建议有权限的人在 Hugging Face 页面上手动确认:
|
| 162 |
+
|
| 163 |
+
- README metadata 是否被页面正确解析。
|
| 164 |
+
- `Files and versions` 中是否有 3 个 safetensors 分片和 index。
|
| 165 |
+
- 页面是否显示 `Safetensors`、`Transformers`、license、任务标签。
|
| 166 |
+
- `Security`/file scan 是否正常,无 pickle 或 malware 警告。
|
| 167 |
+
- LFS storage 页面是否没有多余历史大文件。
|
| 168 |
+
|
| 169 |
+
## 优先级建议
|
| 170 |
+
|
| 171 |
+
### P0:阻断项
|
| 172 |
+
|
| 173 |
+
本地未发现明确 P0 阻断项。`AutoConfig` 和模型类动态导入通过,权重以 LFS pointer 形式跟踪。
|
| 174 |
+
|
| 175 |
+
### P1:已完成
|
| 176 |
+
|
| 177 |
+
- 已删除 README 中的 license 待办注释。
|
| 178 |
+
- 已添加根目录 `LICENSE` 文件,放 Apache-2.0 全文。
|
| 179 |
+
- 已补 README 标准模型卡小节:intended use、limitations、training data、training procedure、evaluation results、ethical considerations。
|
| 180 |
+
- 已加 “Requirements” 小节,写清 Python、Transformers、PyTorch、`torchaudio`、可选 `flash-attn`。
|
| 181 |
+
- 已确认并保留 `demo/demo_gt.wav` 示例音频,README 说明可替换为自有音频。
|
| 182 |
+
|
| 183 |
+
### P2:增强可发现性和可维护性
|
| 184 |
+
|
| 185 |
+
- 如 Hugging Face metadata UI 校验通过,增加 `pipeline_tag: feature-extraction`。
|
| 186 |
+
- 如有论文指标,加入 eval 表格;可结构化时再加入 `model-index`。
|
| 187 |
+
- 如训练数据有公开 Hub dataset id,补 `datasets` metadata;否则正文解释数据范围和不可公开原因。
|
| 188 |
+
- 用有权限账号确认远程页面和 LFS storage 状态。
|
| 189 |
+
|
| 190 |
+
## 推荐的 README metadata 方向
|
| 191 |
+
|
| 192 |
+
以下仅是方向,`pipeline_tag` 需要以 Hugging Face metadata UI 的校验结果为准:
|
| 193 |
+
|
| 194 |
+
```yaml
|
| 195 |
+
---
|
| 196 |
+
license: apache-2.0
|
| 197 |
+
library_name: transformers
|
| 198 |
+
pipeline_tag: feature-extraction
|
| 199 |
+
tags:
|
| 200 |
+
- audio
|
| 201 |
+
- audio-tokenizer
|
| 202 |
+
- neural-codec
|
| 203 |
+
- speech-tokenizer
|
| 204 |
+
- trust-remote-code
|
| 205 |
+
- arxiv:2602.10934
|
| 206 |
+
---
|
| 207 |
+
```
|
| 208 |
+
|
| 209 |
+
如果 `pipeline_tag: feature-extraction` 不适合该 tokenizer,就不要强行添加;保留自定义 tags,并在正文明确这是 audio tokenizer / neural codec。
|
| 210 |
+
|
| 211 |
+
## 最终判断
|
| 212 |
+
|
| 213 |
+
按本地证据看,这个仓库已经基本符合 Hugging Face 模型仓库和 Transformers custom remote-code 的关键规范:模型卡存在、metadata 基本可用、config/auto_map 正确、权重是 safetensors、LFS tracking 正确、分片和索引一致、文件数量和分片大小都在建议范围内。
|
| 214 |
+
|
| 215 |
+
主要发布质量问题已经整改:LICENSE、README 待办注释、模型卡补充、依赖版本说明和 remote-code 安全建议都已加入。剩余风险主要是远程页面因为匿名访问 401,需要有权限账号再做最后确认;另外 `pipeline_tag`、`datasets`、`model-index` 是否补充取决于 Hugging Face metadata 校验结果和可公开的训练/评测信息。
|