Upload 9 files

Browse files

Files changed (8) hide show

.cache/huggingface/.gitignore +1 -0
.cache/huggingface/download/.gitattributes.metadata +3 -0
.cache/huggingface/download/README.md.metadata +3 -0
.cache/huggingface/download/meta.yaml.metadata +3 -0
.cache/huggingface/download/pytorch_model.bin.metadata +3 -0
README.md +77 -0
meta.yaml +2 -0
pytorch_model.bin +3 -0

.cache/huggingface/.gitignore ADDED Viewed

	@@ -0,0 +1 @@


1	+ *

.cache/huggingface/download/.gitattributes.metadata ADDED Viewed

	@@ -0,0 +1,3 @@

+daee7fd9989a62594084fd8e1a99e61beb5b0e85
+a6344aac8c09253b3b630fb776ae94478aa0275b
+1769270800.241348

.cache/huggingface/download/README.md.metadata ADDED Viewed

	@@ -0,0 +1,3 @@

+daee7fd9989a62594084fd8e1a99e61beb5b0e85
+5d3ff3ee58bc298375f7d85766edcd5e4cc8b176
+1769270800.4092014

.cache/huggingface/download/meta.yaml.metadata ADDED Viewed

	@@ -0,0 +1,3 @@

+daee7fd9989a62594084fd8e1a99e61beb5b0e85
+3a2807f7088719f9837edd74197697c0747aa078
+1769270800.4092014

.cache/huggingface/download/pytorch_model.bin.metadata ADDED Viewed

	@@ -0,0 +1,3 @@

+daee7fd9989a62594084fd8e1a99e61beb5b0e85
+adace21f679b30f071c02e0cb3502d965ab08b50be936a5e81944674a5ae101e
+1769270800.5259283

README.md ADDED Viewed

	@@ -0,0 +1,77 @@

+---
+license: apache-2.0
+tags:
+- audio
+- speech
+- audio-to-audio
+- speech-language-models
+datasets:
+- amphion/Emilia-Dataset
+- facebook/multilingual_librispeech
+- CSTR-Edinburgh/vctk
+- google/fleurs
+- mozilla-foundation/common_voice_13_0
+- mythicinfinity/libritts_r
+---
+# Model Details
+Distill-NeuCodec is a version of NeuCodec with a compatible, distilled encoder.
+The distilled encoder is 10x smaller in parameter count and uses ~7.5x less MACs at inference time.
+The distilled model makes the following adjustments to the model:
+* Swap the notoriuously slow [BigCodec](https://arxiv.org/abs/2409.05377) acoustic encoder for the [SQCodec](https://arxiv.org/abs/2504.04949) acoustic encoder (70m → 36m)
+* Swap the [w2v-bert-2.0](https://huggingface.co/facebook/w2v-bert-2.0) semantic encoder for [DistilHuBERT](https://huggingface.co/ntu-spml/distilhubert) (600m → 21m)
+Our work is largely based on extending the work of [X-Codec2.0](https://huggingface.co/HKUSTAudio/xcodec2) and [SQCodec](https://arxiv.org/abs/2504.04949).
+- **Developed by:** Neuphonic
+- **Model type:** Neural Audio Codec
+- **License:** apache-2.0
+- **Repository:** https://github.com/neuphonic/neucodec
+- **Paper:** [arXiv](https://arxiv.org/abs/2509.09550)
+- **Pre-encoded Datasets:**
+  - [Emilia-YODAS-EN](https://huggingface.co/datasets/neuphonic/emilia-yodas-english-neucodec)
+  - *More coming soon!*
+## Get Started
+Use the code below to get started with the model.
+To install from pypi in a dedicated environment, using Python 3.10 or above:
+```bash
+conda create -n neucodec python=3.10
+conda activate neucodec
+pip install neucodec
+```
+Then, to use in python:
+```python
+import librosa
+import torch
+import torchaudio
+from torchaudio import transforms as T
+from neucodec import DistillNeuCodec
+model = DistillNeuCodec.from_pretrained("neuphonic/distill-neucodec")
+model.eval().cuda()
+y, sr = torchaudio.load(librosa.ex("libri1"))
+if sr != 16_000:
+    y = T.Resample(sr, 16_000)(y)[None, ...] # (B, 1, T_16)
+with torch.no_grad():
+    fsq_codes = model.encode_code(y)
+    # fsq_codes = model.encode_code(librosa.ex("libri1")) # or directly pass your filepath!
+    print(f"Codes shape: {fsq_codes.shape}")
+    recon = model.decode_code(fsq_codes).cpu() # (B, 1, T_24)
+torchaudio.save("reconstructed.wav", recon[0, :, :], 24_000)
+```
+## Training Details
+The model was trained using the same data as the full model, with an additional distillation loss (MSE between distilled and original encoder ouputs).

meta.yaml ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ author: neuphonic
2	+ license: apache-2.0

pytorch_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:adace21f679b30f071c02e0cb3502d965ab08b50be936a5e81944674a5ae101e
+size 1025488162