Fix pipeline_tag to text-to-audio (valid HF tag), keep foley tag in tags list
Browse files
README.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
| 1 |
---
|
| 2 |
library_name: diffusers
|
| 3 |
-
pipeline_tag:
|
| 4 |
language:
|
| 5 |
- en
|
| 6 |
license: other
|
|
@@ -59,14 +59,12 @@ client = OpenAI(
|
|
| 59 |
api_key='your-api-key',
|
| 60 |
)
|
| 61 |
|
| 62 |
-
# Generate foley audio from video
|
| 63 |
-
|
| 64 |
-
|
| 65 |
-
|
| 66 |
-
|
| 67 |
-
|
| 68 |
-
voice='foley',
|
| 69 |
-
)
|
| 70 |
response.stream_to_file('foley.wav')
|
| 71 |
```
|
| 72 |
|
|
@@ -86,7 +84,11 @@ foley_model = torch.load(
|
|
| 86 |
|
| 87 |
# Load auxiliary models
|
| 88 |
vae = torch.load('vae_128d_48k.pth', map_location=device, weights_only=False)
|
| 89 |
-
sync_encoder = torch.load(
|
|
|
|
|
|
|
|
|
|
|
|
|
| 90 |
```
|
| 91 |
|
| 92 |
See [github.com/zenlm/zen-audio](https://github.com/zenlm/zen-audio) for the full inference pipeline.
|
|
|
|
| 1 |
---
|
| 2 |
library_name: diffusers
|
| 3 |
+
pipeline_tag: text-to-audio
|
| 4 |
language:
|
| 5 |
- en
|
| 6 |
license: other
|
|
|
|
| 59 |
api_key='your-api-key',
|
| 60 |
)
|
| 61 |
|
| 62 |
+
# Generate foley audio from video description
|
| 63 |
+
response = client.audio.speech.create(
|
| 64 |
+
model='zen-foley',
|
| 65 |
+
input='footsteps on gravel with ambient wind',
|
| 66 |
+
voice='foley',
|
| 67 |
+
)
|
|
|
|
|
|
|
| 68 |
response.stream_to_file('foley.wav')
|
| 69 |
```
|
| 70 |
|
|
|
|
| 84 |
|
| 85 |
# Load auxiliary models
|
| 86 |
vae = torch.load('vae_128d_48k.pth', map_location=device, weights_only=False)
|
| 87 |
+
sync_encoder = torch.load(
|
| 88 |
+
'synchformer_state_dict.pth',
|
| 89 |
+
map_location=device,
|
| 90 |
+
weights_only=False,
|
| 91 |
+
)
|
| 92 |
```
|
| 93 |
|
| 94 |
See [github.com/zenlm/zen-audio](https://github.com/zenlm/zen-audio) for the full inference pipeline.
|