--- license: bsd-3-clause library_name: pytorch tags: - voice-conversion - knn-vc - voice-protection - maestro - voiceshield --- # Voice Shield · kNN-VC HiFi-GAN Speaker anonymization checkpoint used by the **Voice Shield** panel in [MAESTRO](https://github.com/AEmotionStudio/MAESTRO)'s AI Workstation (Tools → Voice Shield → Anonymize tab). ## What this is The **prematched HiFi-GAN generator** from [bshall/knn-vc](https://github.com/bshall/knn-vc), repackaged as a single `.safetensors` file. Trained to vocode 1024-dim WavLM-Large layer-6 features back to 16 kHz audio. Pairs with `microsoft/wavlm-large` (used upstream from the HuggingFace Hub directly — no need for a separate mirror). - Architecture: HiFi-GAN with a `lin_pre` Linear(1024→512) prefix - Parameters: ~16.5 M - Input: WavLM-Large layer-6 hidden states `[B, T, 1024]` - Output: 16 kHz waveform `[B, samples]` (320× upsample) - License: BSD-3-Clause (original kNN-VC license preserved) ## What it's used for in MAESTRO The Voice Shield panel offers **voice-cloning protection** with three threat models. The Anonymize tab uses this checkpoint to transform a user's voice into a different synthetic speaker via the kNN-VC pipeline: 1. Extract WavLM-Large layer-6 features from the user's voice. 2. For each frame, find the k=4 nearest matches in a "target speaker" feature pool and average them. 3. Vocode the matched features back to audio with this HiFi-GAN. The output sounds like the target speaker — so any voice-cloning model trained on the output learns the target's identity, not the user's. This is the **only** voice-protection paradigm currently robust against adversarial-perturbation strippers like LightShed. ## Loading ```python from huggingface_hub import hf_hub_download from safetensors.torch import load_file path = hf_hub_download(repo_id="AEmotionStudio/voiceshield-targets", filename="hifigan_knnvc.safetensors") state = load_file(path) # state-dict keys match the bshall prematch_g_02500000.pt schema. ``` MAESTRO loads this through the vendor module at `backend/ai/voiceshield/knn_vc_vendor/hifigan.py`. ## Credits - Original kNN-VC: Matthew Baas, Benjamin van Niekerk, Herman Kamper — ["Voice Conversion With Just Nearest Neighbors"](https://arxiv.org/abs/2305.18975), Interspeech 2023. Code: . - WavLM-Large: Microsoft (). ## Honest framing Anonymization is a one-way transform. The original speaker identity is unrecoverable from the output — that's the point, and the trade-off the user opts into.