Automatic Speech Recognition
ESPnet
English
audio
audio_captioning
How to use from the
Use from the
ESPnet library
from espnet2.bin.asr_inference import Speech2Text

model = Speech2Text.from_pretrained(
  "espnet/DCASE23.AudioCaptioning.PreTrained"
)

speech, rate = soundfile.read("speech.wav")
text, *_ = model(speech)[0]
README.md exists but content is empty.
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train espnet/DCASE23.AudioCaptioning.PreTrained