Audio-to-Audio
PyTorch
audio
speech
speech-language-models
longnh2012 commited on
Commit
1db5327
·
verified ·
1 Parent(s): bc199b8

Upload 9 files

Browse files
.cache/huggingface/.gitignore ADDED
@@ -0,0 +1 @@
 
 
1
+ *
.cache/huggingface/download/.gitattributes.metadata ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ daee7fd9989a62594084fd8e1a99e61beb5b0e85
2
+ a6344aac8c09253b3b630fb776ae94478aa0275b
3
+ 1769270800.241348
.cache/huggingface/download/README.md.metadata ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ daee7fd9989a62594084fd8e1a99e61beb5b0e85
2
+ 5d3ff3ee58bc298375f7d85766edcd5e4cc8b176
3
+ 1769270800.4092014
.cache/huggingface/download/meta.yaml.metadata ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ daee7fd9989a62594084fd8e1a99e61beb5b0e85
2
+ 3a2807f7088719f9837edd74197697c0747aa078
3
+ 1769270800.4092014
.cache/huggingface/download/pytorch_model.bin.metadata ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ daee7fd9989a62594084fd8e1a99e61beb5b0e85
2
+ adace21f679b30f071c02e0cb3502d965ab08b50be936a5e81944674a5ae101e
3
+ 1769270800.5259283
README.md ADDED
@@ -0,0 +1,77 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - audio
5
+ - speech
6
+ - audio-to-audio
7
+ - speech-language-models
8
+ datasets:
9
+ - amphion/Emilia-Dataset
10
+ - facebook/multilingual_librispeech
11
+ - CSTR-Edinburgh/vctk
12
+ - google/fleurs
13
+ - mozilla-foundation/common_voice_13_0
14
+ - mythicinfinity/libritts_r
15
+ ---
16
+
17
+ # Model Details
18
+
19
+ Distill-NeuCodec is a version of NeuCodec with a compatible, distilled encoder.
20
+
21
+ The distilled encoder is 10x smaller in parameter count and uses ~7.5x less MACs at inference time.
22
+
23
+ The distilled model makes the following adjustments to the model:
24
+ * Swap the notoriuously slow [BigCodec](https://arxiv.org/abs/2409.05377) acoustic encoder for the [SQCodec](https://arxiv.org/abs/2504.04949) acoustic encoder (70m → 36m)
25
+ * Swap the [w2v-bert-2.0](https://huggingface.co/facebook/w2v-bert-2.0) semantic encoder for [DistilHuBERT](https://huggingface.co/ntu-spml/distilhubert) (600m → 21m)
26
+
27
+ Our work is largely based on extending the work of [X-Codec2.0](https://huggingface.co/HKUSTAudio/xcodec2) and [SQCodec](https://arxiv.org/abs/2504.04949).
28
+
29
+ - **Developed by:** Neuphonic
30
+ - **Model type:** Neural Audio Codec
31
+ - **License:** apache-2.0
32
+ - **Repository:** https://github.com/neuphonic/neucodec
33
+ - **Paper:** [arXiv](https://arxiv.org/abs/2509.09550)
34
+ - **Pre-encoded Datasets:**
35
+ - [Emilia-YODAS-EN](https://huggingface.co/datasets/neuphonic/emilia-yodas-english-neucodec)
36
+ - *More coming soon!*
37
+
38
+
39
+ ## Get Started
40
+
41
+ Use the code below to get started with the model.
42
+
43
+ To install from pypi in a dedicated environment, using Python 3.10 or above:
44
+
45
+ ```bash
46
+ conda create -n neucodec python=3.10
47
+ conda activate neucodec
48
+ pip install neucodec
49
+ ```
50
+ Then, to use in python:
51
+
52
+ ```python
53
+ import librosa
54
+ import torch
55
+ import torchaudio
56
+ from torchaudio import transforms as T
57
+ from neucodec import DistillNeuCodec
58
+
59
+ model = DistillNeuCodec.from_pretrained("neuphonic/distill-neucodec")
60
+ model.eval().cuda()
61
+
62
+ y, sr = torchaudio.load(librosa.ex("libri1"))
63
+ if sr != 16_000:
64
+ y = T.Resample(sr, 16_000)(y)[None, ...] # (B, 1, T_16)
65
+
66
+ with torch.no_grad():
67
+ fsq_codes = model.encode_code(y)
68
+ # fsq_codes = model.encode_code(librosa.ex("libri1")) # or directly pass your filepath!
69
+ print(f"Codes shape: {fsq_codes.shape}")
70
+ recon = model.decode_code(fsq_codes).cpu() # (B, 1, T_24)
71
+
72
+ torchaudio.save("reconstructed.wav", recon[0, :, :], 24_000)
73
+ ```
74
+
75
+ ## Training Details
76
+
77
+ The model was trained using the same data as the full model, with an additional distillation loss (MSE between distilled and original encoder ouputs).
meta.yaml ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ author: neuphonic
2
+ license: apache-2.0
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:adace21f679b30f071c02e0cb3502d965ab08b50be936a5e81944674a5ae101e
3
+ size 1025488162