PulpCut PocketTTS Voice Embeddings
What this repository is
This repository hosts precomputed voice conditioning embeddings (.safetensors) used by the PocketTTS runtime in the Pulpcut editor.
These files are not full TTS models and are not intended for standalone speech synthesis. They are conditioning assets consumed by the PocketTTS model/runtime.
- Target runtime: PocketTTS browser runtime (JAX/WebGPU/WebGL/WASM)
- Main app usage: Pulpcut editor audio generation flow
- File format:
.safetensors - Current conversion metadata in Pulpcut manifest:
conversionVersion = 0.1.0
Why these embeddings were converted and republished
The main reason for conversion is compatibility: many raw voice assets in the Kyutai tts-voices tree are not directly guaranteed to work with the PocketTTS variant used in Pulpcut.
Pulpcut runs a browser pocket-jax runtime and uses Pocket weights from:
ekzhang/jax-js-models(kyutai-pocket-tts_b6369a24-fp16.safetensors)
while source voice assets come from:
kyutai/tts-voices
In practice, this creates variant/runtime compatibility constraints. Pulpcut expects Pocket-compatible voice conditioning tensors (including accepted tensor keys/dtypes/shapes), and incompatible assets are rejected by runtime checks.
The upstream Kyutai source also uses filenames such as:
<voice>.wav.<hash>@240.safetensors
For Pulpcut integration we republish a curated subset to a stable URL namespace with deterministic IDs, for example:
embeddings_v2/pocket-vctk-p225-023.safetensors
This conversion + republish step exists to provide:
- A compatibility-validated subset for the Pulpcut Pocket runtime variant.
- Stable, deterministic URLs for runtime allowlist and browser caching.
- Clear per-voice provenance and license attribution in Pulpcut metadata.
- Operational consistency for browser loading and update/version control.
Compatibility differences we account for
- Model/runtime variant:
Pulpcut uses browser
pocket-jaxwith thekyutai-pocket-tts_b6369a24-fp16weights path. - Voice asset format constraints: Runtime enforces Pocket-compatible conditioning format and can reject assets as incompatible.
- Inventory mismatch:
Upstream
kyutai/tts-voicesincludes broader sets/variants; Pulpcut ships only compatibility-validated voices. - Browser safety/allowlist constraints:
Pulpcut enforces strict trusted URL/path rules and excludes unsupported variants (for example, selected
_enhancedcases).
How conversion/normalization is done (high level)
- Read upstream inventory from
kyutai/tts-voices(audio + corresponding embedding files). - Apply category, license, and compatibility filtering (mandatory allowlist gate).
- Exclude restricted or runtime-incompatible entries.
- Normalize voice IDs and filenames to deterministic Pulpcut naming.
- Publish normalized/converted embedding files under this repository (
embeddings_v2/). - Store per-voice metadata in Pulpcut manifest (source URL, preview URL, license class, attribution requirement, checksum/size metadata).
Notes:
- For some categories,
_enhancedvariants are intentionally excluded. - Pulpcut runtime only accepts trusted
huggingface.coembedding URLs matching an explicit allowlist. - Sync fails if the compatibility allowlist is missing/invalid, which prevents unvalidated voices from being shipped.
Source and attribution
Primary upstream source:
Per-voice provenance:
- In Pulpcut, each generated voice entry stores:
sourceUrl(upstream original audio location)previewAudioUrl(upstream preview file)runtimeAssetUrl(this repository asset URL)licenseClassandattributionText
Example mapping (conceptual):
- Original:
https://huggingface.co/kyutai/tts-voices/blob/main/vctk/p225_023.wav - Converted runtime asset:
https://huggingface.co/PulpCut/pocket-tts-voice-embeddings/resolve/main/embeddings_v2/pocket-vctk-p225-023.safetensors
Licensing
This repository contains a mixed-license set of embeddings based on upstream voice assets.
- Present license classes in Pulpcut manifest:
cc-by-4.0,cc0-1.0 - Attribution is required for
cc-by-4.0voices
Please follow upstream license terms per voice. Do not assume one global license applies to all files.
What these files are used for in Pulpcut editor
In Pulpcut, these embeddings are used to condition PocketTTS output during in-browser generation:
- User selects PocketTTS voice in the editor.
- Pulpcut loads the corresponding embedding from this repo.
- PocketTTS generates audio locally in-browser.
- Pulpcut stores generation provenance on the clip:
- model ID
- voice ID
- voice license
- attribution text
- source URL
- generation prompt
Pulpcut also surfaces attribution-required notices for CC-BY voices in the generation UI and clip properties.
Safety and intended use
Intended use:
- Voice conditioning for PocketTTS in Pulpcut editor.
Not intended use:
- Voice cloning workflows
- Identity impersonation
- Any use that violates source licenses or applicable laws
Current curated snapshot (from Pulpcut manifest)
At the time of writing, Pulpcut manifest includes:
- Total voices: 144
- By voice set:
alba-mackenna: 4cml-tts-fr: 35unmute-prod-website: 4vctk: 101
- By license class:
cc-by-4.0: 141cc0-1.0: 3
These counts may change as compatibility and policy filters evolve.
Reproducibility references (Pulpcut repo)
- Voice manifest normalization:
scripts/pocketVoicesManifest.ts - Sync/check pipeline:
scripts/syncPocketVoices.ts - Runtime allowlist logic:
vendors/pocket-tts-vendored/src/pocketTtsRuntime.ts - Generated manifest consumed by app:
src/modules/editor/services/pocketVoices.generated.json
Contact
For issues specific to this republished embedding set, open an issue in the Pulpcut project.
- Downloads last month
- -