notmax123 commited on
Commit
183c69d
·
verified ·
1 Parent(s): 0a8aff9

Improve model card: scope, links, correct filename model.safetensors

Browse files
Files changed (1) hide show
  1. README.md +49 -14
README.md CHANGED
@@ -1,31 +1,66 @@
1
  ---
2
  license: mit
 
 
3
  tags:
4
  - text-to-speech
5
  - speech-synthesis
6
  - autoencoder
7
  - audio
 
 
8
  ---
9
 
10
- # BlueCodec — Speech Autoencoder
11
 
12
- A neural speech autoencoder that compresses 44.1 kHz audio into a compact continuous latent representation, used as the first stage of the Light-BlueTTS text-to-speech system.
13
 
14
- The encoder turns raw audio into a 24-dim latent sequence at ~86 Hz. Downstream TTS modules (flow-matching, duration prediction) operate entirely in this latent space, making synthesis fast and lightweight. The decoder reconstructs full-quality waveforms from those latents at inference time.
 
 
 
 
15
 
16
- ---
17
 
18
- ## Architecture
19
 
20
- | Component | Details |
21
- |---|---|
22
- | Input | 1253-channel spectrogram (1025 log-linear + 228 log-mel, FFT 2048, hop 512) |
23
- | Encoder (~25.6M) | Conv1d stem (1253→512) + 10 ConvNeXt blocks + proj (512→24) |
24
- | Decoder (~25.3M) | CausalConv1d stem (24→512) + 10 causal dilated ConvNeXt blocks + VocoderHead |
25
- | Latent | 24-dim @ ~86 Hz |
26
 
27
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
 
29
- ## Checkpoint
30
 
31
- `ae_latest.safetensors` — encoder + decoder weights (~204 MB). Keys are prefixed with `encoder.*` and `decoder.*`.
 
1
  ---
2
  license: mit
3
+ language:
4
+ - multilingual
5
  tags:
6
  - text-to-speech
7
  - speech-synthesis
8
  - autoencoder
9
  - audio
10
+ - onnx
11
+ pipeline_tag: text-to-speech
12
  ---
13
 
14
+ # BlueCodec — speech autoencoder (codec only)
15
 
16
+ This repository publishes **only the neural audio codec** used by **[BlueTTS](https://github.com/maxmelichov/BlueTTS)**: a 44.1 kHz speech **autoencoder** that maps waveforms to a low-rate continuous latent sequence and back. It is **not** a full TTS model (no text encoder, duration model, or flow stack).
17
 
18
+ | If you need… | Use |
19
+ |--------------|-----|
20
+ | **End-to-end ONNX TTS** | [`notmax123/blue-onnx`](https://huggingface.co/notmax123/blue-onnx) + [BlueTTS](https://github.com/maxmelichov/BlueTTS) |
21
+ | **Full PyTorch stack + stats** (training / voice export) | [`notmax123/blue`](https://huggingface.co/notmax123/blue) — includes `blue_codec.safetensors` alongside TTL/DP weights |
22
+ | **Training the codec from scratch** | [maxmelichov/blue-codec](https://github.com/maxmelichov/blue-codec) (standalone repo & [training doc](https://github.com/maxmelichov/blue-codec/blob/main/docs/training.md)) |
23
 
24
+ **Project home:** [https://github.com/maxmelichov/BlueTTS](https://github.com/maxmelichov/BlueTTS) · **Live demo:** [Hugging Face Space — notmax123/Blue](https://huggingface.co/spaces/notmax123/Blue)
25
 
26
+ ## What it does
27
 
28
+ - **Encoder:** waveform → spectrogram features → **24-dimensional latents** at **~86 Hz** (compact trajectory for downstream TTS).
29
+ - **Decoder:** latents → high-quality **44.1 kHz** audio (causal stack + vocoder head).
 
 
 
 
30
 
31
+ Downstream BlueTTS modules (flow matching, duration, text-to-latent) run in this latent space; keeping synthesis lightweight and fast.
32
+
33
+ ## Architecture (summary)
34
+
35
+ | Piece | Details |
36
+ |-------|---------|
37
+ | **Input** | 1253-channel spectrogram (1025 log-linear + 228 log-mel; FFT 2048, hop 512) |
38
+ | **Encoder** (~25.6M params) | Conv1d stem (1253→512) + 10 ConvNeXt blocks + projection (512→24) |
39
+ | **Decoder** (~25.3M params) | CausalConv1d stem (24→512) + 10 causal dilated ConvNeXt blocks + vocoder head |
40
+ | **Latent** | 24-D @ ~86 Hz |
41
+
42
+ ## Checkpoint in this repo
43
+
44
+ | File | Role |
45
+ |------|------|
46
+ | **`model.safetensors`** | Encoder + decoder weights (Safetensors). State dict keys are typically prefixed with `encoder.*` and `decoder.*`. |
47
+
48
+ *(An older naming convention in some local scripts is `ae_latest.safetensors`; the file served from **this** Hub repo is **`model.safetensors`**.)*
49
+
50
+ ## Download
51
+
52
+ ```bash
53
+ hf download notmax123/blue-codec --repo-type model --local-dir ./blue_codec_only
54
+ ```
55
+
56
+ Equivalent:
57
+
58
+ ```bash
59
+ huggingface-cli download notmax123/blue-codec --repo-type model --local-dir ./blue_codec_only
60
+ ```
61
+
62
+ Repo id is **case-sensitive**: `notmax123/blue-codec`.
63
 
64
+ ## License
65
 
66
+ MIT — align usage with [BlueTTS](https://github.com/maxmelichov/BlueTTS) and the [blue-codec](https://github.com/maxmelichov/blue-codec) repository for any training or redistribution terms that apply to your use case.