File size: 1,473 Bytes
e28e999 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 | ---
license: mit
library_name: pytorch
tags:
- audio-generation
- sound-effects
- voice-conditioned
- text-conditioned
- pytorch
---
# VTS

VTS (Voice To Sound) generates sound effects from:
- a short vocal sketch
- a text prompt
This repository hosts the pretrained checkpoint files for the older `voice_cond` VTS pipeline.
## Files
- `model_voice_1030_24.pth`: main diffusion checkpoint
- `vae_weight.pth`: VAE checkpoint used for decoding
## Download
```bash
pip install -U "huggingface_hub"
hf download Daniel777/VTS model_voice_1030_24.pth vae_weight.pth --local-dir ./checkpoints
```
## Usage
Use these checkpoints with the companion `voice_text_sfx` codebase.
```bash
python3 scripts/infer.py \
--model-ckpt ./checkpoints/model_voice_1030_24.pth \
--ae-ckpt ./checkpoints/vae_weight.pth \
--prompt-audio /path/to/prompt.wav \
--text "glassy swipe with rising pitch" \
--output /tmp/generated.wav \
--duration 3.0 \
--steps 100 \
--cfg-scale 6.0 \
--device cuda
```
## Notes
- This checkpoint matches the older `voice_cond` path.
- It is not a drop-in checkpoint for later `script_embed` or `voice_prompt` variants.
- This is a research checkpoint, not a packaged Hugging Face Inference API model.
## SHA256
- `model_voice_1030_24.pth`: `a061bfb5e4fca61d8857c3056245304d0a421b55d4f86deca3b47442b08f5287`
- `vae_weight.pth`: `45e2d5ab17e5bbb22dc533cd70798bb4ed96dbbe3487f6f20f5528fc9915558e`
|