Instructions to use espnet/DCASE23.AudioCaptioning.PreTrained with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- ESPnet
How to use espnet/DCASE23.AudioCaptioning.PreTrained with ESPnet:
from espnet2.bin.asr_inference import Speech2Text model = Speech2Text.from_pretrained( "espnet/DCASE23.AudioCaptioning.PreTrained" ) speech, rate = soundfile.read("speech.wav") text, *_ = model(speech)[0] - Notebooks
- Google Colab
- Kaggle
Update README.md
Browse files
README.md
CHANGED
|
@@ -3,8 +3,11 @@ tags:
|
|
| 3 |
- espnet
|
| 4 |
- audio
|
| 5 |
- automatic-speech-recognition
|
|
|
|
| 6 |
language: en
|
| 7 |
datasets:
|
| 8 |
- clotho_v2
|
|
|
|
|
|
|
| 9 |
license: cc-by-4.0
|
| 10 |
---
|
|
|
|
| 3 |
- espnet
|
| 4 |
- audio
|
| 5 |
- automatic-speech-recognition
|
| 6 |
+
- audio_captioning
|
| 7 |
language: en
|
| 8 |
datasets:
|
| 9 |
- clotho_v2
|
| 10 |
+
- slseanwu/clotho-chatgpt-mixup-50K
|
| 11 |
+
- audiocaps
|
| 12 |
license: cc-by-4.0
|
| 13 |
---
|