docs: drop size column for cleaner table rendering
Browse files
README.md
CHANGED
|
@@ -38,11 +38,11 @@ samson-ailabs/SoviaMate-Codec
|
|
| 38 |
βββ wavlm_ecapa.pth # WavLM + ECAPA-TDNN speaker verifier
|
| 39 |
```
|
| 40 |
|
| 41 |
-
| Asset | Purpose |
|
| 42 |
-
|---|---|
|
| 43 |
-
| `neural_audio_codec/audio_codec_base.ckpt` | **Reconstruction codec.** Encoder + quantizer + decoder, trained as a standard compress / reconstruct codec without the speaker-adaptation objective. Use for low-bitrate speech coding and feature extraction. (No ASR head.) |
|
| 44 |
-
| `neural_audio_codec/audio_codec_spk.ckpt` | **Voice-conversion codec.** Adds the integrated ASR head and the post-quantization speaker adapter trained for zero-shot voice swapping from a 3β5 s reference. Always pass a speaker prompt β running it without one under-conditions the decoder and degrades quality. Use `base` for plain reconstruction. |
|
| 45 |
-
| `speaker_verification/*` | Pretrained speaker-embedding extractors. `campplus.bin` and `eres2netv2.ckpt` are interchangeable backbones for the speaker adapter β whichever was used at training is also required at inference time for that `spk` checkpoint (this release uses `campplus.bin`). `wavlm_ecapa.pth` is for evaluation only (e.g., SECS-style speaker-similarity scoring). |
|
| 46 |
|
| 47 |
Each codec checkpoint is a portable export containing `model_weights` (per-module `state_dict`) and `hyper_parameters` (architecture config), produced by `AudioCodecTask.export_model()`. Optimizer state, discriminators, and other training-only components are excluded.
|
| 48 |
|
|
|
|
| 38 |
βββ wavlm_ecapa.pth # WavLM + ECAPA-TDNN speaker verifier
|
| 39 |
```
|
| 40 |
|
| 41 |
+
| Asset | Purpose |
|
| 42 |
+
|---|---|
|
| 43 |
+
| `neural_audio_codec/audio_codec_base.ckpt` | **Reconstruction codec.** Encoder + quantizer + decoder, trained as a standard compress / reconstruct codec without the speaker-adaptation objective. Use for low-bitrate speech coding and feature extraction. (No ASR head.) |
|
| 44 |
+
| `neural_audio_codec/audio_codec_spk.ckpt` | **Voice-conversion codec.** Adds the integrated ASR head and the post-quantization speaker adapter trained for zero-shot voice swapping from a 3β5 s reference. Always pass a speaker prompt β running it without one under-conditions the decoder and degrades quality. Use `base` for plain reconstruction. |
|
| 45 |
+
| `speaker_verification/*` | Pretrained speaker-embedding extractors. `campplus.bin` and `eres2netv2.ckpt` are interchangeable backbones for the speaker adapter β whichever was used at training is also required at inference time for that `spk` checkpoint (this release uses `campplus.bin`). `wavlm_ecapa.pth` is for evaluation only (e.g., SECS-style speaker-similarity scoring). |
|
| 46 |
|
| 47 |
Each codec checkpoint is a portable export containing `model_weights` (per-module `state_dict`) and `hyper_parameters` (architecture config), produced by `AudioCodecTask.export_model()`. Optimizer state, discriminators, and other training-only components are excluded.
|
| 48 |
|