samson-ailabs
/

SoviaMate-Codec

@@ -38,11 +38,11 @@ samson-ailabs/SoviaMate-Codec
     └── wavlm_ecapa.pth         # WavLM + ECAPA-TDNN speaker verifier
 ```
-| Asset | Purpose | Size |
-|---|---|---|
-| `neural_audio_codec/audio_codec_base.ckpt` | **Reconstruction codec.** Encoder + quantizer + decoder, trained as a standard compress / reconstruct codec without the speaker-adaptation objective. Use for low-bitrate speech coding and feature extraction. (No ASR head.) | ~753 MB |
-| `neural_audio_codec/audio_codec_spk.ckpt` | **Voice-conversion codec.** Adds the integrated ASR head and the post-quantization speaker adapter trained for zero-shot voice swapping from a 3–5 s reference. Always pass a speaker prompt — running it without one under-conditions the decoder and degrades quality. Use `base` for plain reconstruction. | ~939 MB |
-| `speaker_verification/*` | Pretrained speaker-embedding extractors. `campplus.bin` and `eres2netv2.ckpt` are interchangeable backbones for the speaker adapter — whichever was used at training is also required at inference time for that `spk` checkpoint (this release uses `campplus.bin`). `wavlm_ecapa.pth` is for evaluation only (e.g., SECS-style speaker-similarity scoring). | ~1.3 GB total |
 Each codec checkpoint is a portable export containing `model_weights` (per-module `state_dict`) and `hyper_parameters` (architecture config), produced by `AudioCodecTask.export_model()`. Optimizer state, discriminators, and other training-only components are excluded.

     └── wavlm_ecapa.pth         # WavLM + ECAPA-TDNN speaker verifier
 ```
+| Asset | Purpose |
+|---|---|
+| `neural_audio_codec/audio_codec_base.ckpt` | **Reconstruction codec.** Encoder + quantizer + decoder, trained as a standard compress / reconstruct codec without the speaker-adaptation objective. Use for low-bitrate speech coding and feature extraction. (No ASR head.) |
+| `neural_audio_codec/audio_codec_spk.ckpt` | **Voice-conversion codec.** Adds the integrated ASR head and the post-quantization speaker adapter trained for zero-shot voice swapping from a 3–5 s reference. Always pass a speaker prompt — running it without one under-conditions the decoder and degrades quality. Use `base` for plain reconstruction. |
+| `speaker_verification/*` | Pretrained speaker-embedding extractors. `campplus.bin` and `eres2netv2.ckpt` are interchangeable backbones for the speaker adapter — whichever was used at training is also required at inference time for that `spk` checkpoint (this release uses `campplus.bin`). `wavlm_ecapa.pth` is for evaluation only (e.g., SECS-style speaker-similarity scoring). |
 Each codec checkpoint is a portable export containing `model_weights` (per-module `state_dict`) and `hyper_parameters` (architecture config), produced by `AudioCodecTask.export_model()`. Optimizer state, discriminators, and other training-only components are excluded.