samson-ailabs commited on
Commit
8d8f3c3
Β·
verified Β·
1 Parent(s): 574cd84

docs: drop size column for cleaner table rendering

Browse files
Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -38,11 +38,11 @@ samson-ailabs/SoviaMate-Codec
38
  └── wavlm_ecapa.pth # WavLM + ECAPA-TDNN speaker verifier
39
  ```
40
 
41
- | Asset | Purpose | Size |
42
- |---|---|---|
43
- | `neural_audio_codec/audio_codec_base.ckpt` | **Reconstruction codec.** Encoder + quantizer + decoder, trained as a standard compress / reconstruct codec without the speaker-adaptation objective. Use for low-bitrate speech coding and feature extraction. (No ASR head.) | ~753 MB |
44
- | `neural_audio_codec/audio_codec_spk.ckpt` | **Voice-conversion codec.** Adds the integrated ASR head and the post-quantization speaker adapter trained for zero-shot voice swapping from a 3–5 s reference. Always pass a speaker prompt β€” running it without one under-conditions the decoder and degrades quality. Use `base` for plain reconstruction. | ~939 MB |
45
- | `speaker_verification/*` | Pretrained speaker-embedding extractors. `campplus.bin` and `eres2netv2.ckpt` are interchangeable backbones for the speaker adapter β€” whichever was used at training is also required at inference time for that `spk` checkpoint (this release uses `campplus.bin`). `wavlm_ecapa.pth` is for evaluation only (e.g., SECS-style speaker-similarity scoring). | ~1.3 GB total |
46
 
47
  Each codec checkpoint is a portable export containing `model_weights` (per-module `state_dict`) and `hyper_parameters` (architecture config), produced by `AudioCodecTask.export_model()`. Optimizer state, discriminators, and other training-only components are excluded.
48
 
 
38
  └── wavlm_ecapa.pth # WavLM + ECAPA-TDNN speaker verifier
39
  ```
40
 
41
+ | Asset | Purpose |
42
+ |---|---|
43
+ | `neural_audio_codec/audio_codec_base.ckpt` | **Reconstruction codec.** Encoder + quantizer + decoder, trained as a standard compress / reconstruct codec without the speaker-adaptation objective. Use for low-bitrate speech coding and feature extraction. (No ASR head.) |
44
+ | `neural_audio_codec/audio_codec_spk.ckpt` | **Voice-conversion codec.** Adds the integrated ASR head and the post-quantization speaker adapter trained for zero-shot voice swapping from a 3–5 s reference. Always pass a speaker prompt β€” running it without one under-conditions the decoder and degrades quality. Use `base` for plain reconstruction. |
45
+ | `speaker_verification/*` | Pretrained speaker-embedding extractors. `campplus.bin` and `eres2netv2.ckpt` are interchangeable backbones for the speaker adapter β€” whichever was used at training is also required at inference time for that `spk` checkpoint (this release uses `campplus.bin`). `wavlm_ecapa.pth` is for evaluation only (e.g., SECS-style speaker-similarity scoring). |
46
 
47
  Each codec checkpoint is a portable export containing `model_weights` (per-module `state_dict`) and `hyper_parameters` (architecture config), produced by `AudioCodecTask.export_model()`. Optimizer state, discriminators, and other training-only components are excluded.
48