PulpCut PocketTTS Voice Embeddings

What this repository is

This repository hosts precomputed voice conditioning embeddings (.safetensors) used by the PocketTTS runtime in the Pulpcut editor.

These files are not full TTS models and are not intended for standalone speech synthesis. They are conditioning assets consumed by the PocketTTS model/runtime.

  • Target runtime: PocketTTS browser runtime (JAX/WebGPU/WebGL/WASM)
  • Main app usage: Pulpcut editor audio generation flow
  • File format: .safetensors
  • Current conversion metadata in Pulpcut manifest: conversionVersion = 0.1.0

Why these embeddings were converted and republished

The main reason for conversion is compatibility: many raw voice assets in the Kyutai tts-voices tree are not directly guaranteed to work with the PocketTTS variant used in Pulpcut.

Pulpcut runs a browser pocket-jax runtime and uses Pocket weights from:

  • ekzhang/jax-js-models (kyutai-pocket-tts_b6369a24-fp16.safetensors)

while source voice assets come from:

  • kyutai/tts-voices

In practice, this creates variant/runtime compatibility constraints. Pulpcut expects Pocket-compatible voice conditioning tensors (including accepted tensor keys/dtypes/shapes), and incompatible assets are rejected by runtime checks.

The upstream Kyutai source also uses filenames such as:

  • <voice>.wav.<hash>@240.safetensors

For Pulpcut integration we republish a curated subset to a stable URL namespace with deterministic IDs, for example:

  • embeddings_v2/pocket-vctk-p225-023.safetensors

This conversion + republish step exists to provide:

  1. A compatibility-validated subset for the Pulpcut Pocket runtime variant.
  2. Stable, deterministic URLs for runtime allowlist and browser caching.
  3. Clear per-voice provenance and license attribution in Pulpcut metadata.
  4. Operational consistency for browser loading and update/version control.

Compatibility differences we account for

  1. Model/runtime variant: Pulpcut uses browser pocket-jax with the kyutai-pocket-tts_b6369a24-fp16 weights path.
  2. Voice asset format constraints: Runtime enforces Pocket-compatible conditioning format and can reject assets as incompatible.
  3. Inventory mismatch: Upstream kyutai/tts-voices includes broader sets/variants; Pulpcut ships only compatibility-validated voices.
  4. Browser safety/allowlist constraints: Pulpcut enforces strict trusted URL/path rules and excludes unsupported variants (for example, selected _enhanced cases).

How conversion/normalization is done (high level)

  1. Read upstream inventory from kyutai/tts-voices (audio + corresponding embedding files).
  2. Apply category, license, and compatibility filtering (mandatory allowlist gate).
  3. Exclude restricted or runtime-incompatible entries.
  4. Normalize voice IDs and filenames to deterministic Pulpcut naming.
  5. Publish normalized/converted embedding files under this repository (embeddings_v2/).
  6. Store per-voice metadata in Pulpcut manifest (source URL, preview URL, license class, attribution requirement, checksum/size metadata).

Notes:

  • For some categories, _enhanced variants are intentionally excluded.
  • Pulpcut runtime only accepts trusted huggingface.co embedding URLs matching an explicit allowlist.
  • Sync fails if the compatibility allowlist is missing/invalid, which prevents unvalidated voices from being shipped.

Source and attribution

Primary upstream source:

Per-voice provenance:

  • In Pulpcut, each generated voice entry stores:
    • sourceUrl (upstream original audio location)
    • previewAudioUrl (upstream preview file)
    • runtimeAssetUrl (this repository asset URL)
    • licenseClass and attributionText

Example mapping (conceptual):

  • Original: https://huggingface.co/kyutai/tts-voices/blob/main/vctk/p225_023.wav
  • Converted runtime asset: https://huggingface.co/PulpCut/pocket-tts-voice-embeddings/resolve/main/embeddings_v2/pocket-vctk-p225-023.safetensors

Licensing

This repository contains a mixed-license set of embeddings based on upstream voice assets.

  • Present license classes in Pulpcut manifest: cc-by-4.0, cc0-1.0
  • Attribution is required for cc-by-4.0 voices

Please follow upstream license terms per voice. Do not assume one global license applies to all files.

What these files are used for in Pulpcut editor

In Pulpcut, these embeddings are used to condition PocketTTS output during in-browser generation:

  1. User selects PocketTTS voice in the editor.
  2. Pulpcut loads the corresponding embedding from this repo.
  3. PocketTTS generates audio locally in-browser.
  4. Pulpcut stores generation provenance on the clip:
    • model ID
    • voice ID
    • voice license
    • attribution text
    • source URL
    • generation prompt

Pulpcut also surfaces attribution-required notices for CC-BY voices in the generation UI and clip properties.

Safety and intended use

Intended use:

  • Voice conditioning for PocketTTS in Pulpcut editor.

Not intended use:

  • Voice cloning workflows
  • Identity impersonation
  • Any use that violates source licenses or applicable laws

Current curated snapshot (from Pulpcut manifest)

At the time of writing, Pulpcut manifest includes:

  • Total voices: 144
  • By voice set:
    • alba-mackenna: 4
    • cml-tts-fr: 35
    • unmute-prod-website: 4
    • vctk: 101
  • By license class:
    • cc-by-4.0: 141
    • cc0-1.0: 3

These counts may change as compatibility and policy filters evolve.

Reproducibility references (Pulpcut repo)

  • Voice manifest normalization: scripts/pocketVoicesManifest.ts
  • Sync/check pipeline: scripts/syncPocketVoices.ts
  • Runtime allowlist logic: vendors/pocket-tts-vendored/src/pocketTtsRuntime.ts
  • Generated manifest consumed by app: src/modules/editor/services/pocketVoices.generated.json

Contact

For issues specific to this republished embedding set, open an issue in the Pulpcut project.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support