Daniel777 commited on
Commit
e28e999
·
verified ·
1 Parent(s): 3a4ad96

Add README.md

Browse files
Files changed (1) hide show
  1. README.md +61 -3
README.md CHANGED
@@ -1,3 +1,61 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ library_name: pytorch
4
+ tags:
5
+ - audio-generation
6
+ - sound-effects
7
+ - voice-conditioned
8
+ - text-conditioned
9
+ - pytorch
10
+ ---
11
+
12
+ # VTS
13
+
14
+ ![VTS overview](./Thumbnail.png)
15
+
16
+ VTS (Voice To Sound) generates sound effects from:
17
+
18
+ - a short vocal sketch
19
+ - a text prompt
20
+
21
+ This repository hosts the pretrained checkpoint files for the older `voice_cond` VTS pipeline.
22
+
23
+ ## Files
24
+
25
+ - `model_voice_1030_24.pth`: main diffusion checkpoint
26
+ - `vae_weight.pth`: VAE checkpoint used for decoding
27
+
28
+ ## Download
29
+
30
+ ```bash
31
+ pip install -U "huggingface_hub"
32
+ hf download Daniel777/VTS model_voice_1030_24.pth vae_weight.pth --local-dir ./checkpoints
33
+ ```
34
+
35
+ ## Usage
36
+
37
+ Use these checkpoints with the companion `voice_text_sfx` codebase.
38
+
39
+ ```bash
40
+ python3 scripts/infer.py \
41
+ --model-ckpt ./checkpoints/model_voice_1030_24.pth \
42
+ --ae-ckpt ./checkpoints/vae_weight.pth \
43
+ --prompt-audio /path/to/prompt.wav \
44
+ --text "glassy swipe with rising pitch" \
45
+ --output /tmp/generated.wav \
46
+ --duration 3.0 \
47
+ --steps 100 \
48
+ --cfg-scale 6.0 \
49
+ --device cuda
50
+ ```
51
+
52
+ ## Notes
53
+
54
+ - This checkpoint matches the older `voice_cond` path.
55
+ - It is not a drop-in checkpoint for later `script_embed` or `voice_prompt` variants.
56
+ - This is a research checkpoint, not a packaged Hugging Face Inference API model.
57
+
58
+ ## SHA256
59
+
60
+ - `model_voice_1030_24.pth`: `a061bfb5e4fca61d8857c3056245304d0a421b55d4f86deca3b47442b08f5287`
61
+ - `vae_weight.pth`: `45e2d5ab17e5bbb22dc533cd70798bb4ed96dbbe3487f6f20f5528fc9915558e`