Automatic Speech Recognition
ESPnet
multilingual
audio
phone-recognition
grapheme-to-phoneme
phoneme-to-grapheme
Instructions to use espnet/powsm with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- ESPnet
How to use espnet/powsm with ESPnet:
from espnet2.bin.asr_inference import Speech2Text model = Speech2Text.from_pretrained( "espnet/powsm" ) speech, rate = soundfile.read("speech.wav") text, *_ = model(speech)[0] - Notebooks
- Google Colab
- Kaggle
fix example script
Browse files
README.md
CHANGED
|
@@ -54,9 +54,12 @@ s2t = Speech2Text.from_pretrained(
|
|
| 54 |
task_sym=task, # <pr>, <asr>, <g2p>, <p2g>
|
| 55 |
)
|
| 56 |
|
| 57 |
-
speech, rate = sf.read("sample.wav"
|
| 58 |
prompt = "<na>" # G2P: set to ASR transcript; P2G: set to phone transcription with slashes
|
| 59 |
pred = s2t(speech, text_prev=prompt)[0][0]
|
|
|
|
|
|
|
|
|
|
| 60 |
if task == '<pr>' or task == '<g2p>:
|
| 61 |
pred = pred.replace("/", "")
|
| 62 |
print(pred)
|
|
|
|
| 54 |
task_sym=task, # <pr>, <asr>, <g2p>, <p2g>
|
| 55 |
)
|
| 56 |
|
| 57 |
+
speech, rate = sf.read("sample.wav")
|
| 58 |
prompt = "<na>" # G2P: set to ASR transcript; P2G: set to phone transcription with slashes
|
| 59 |
pred = s2t(speech, text_prev=prompt)[0][0]
|
| 60 |
+
|
| 61 |
+
# post-processing for better format
|
| 62 |
+
pred = pred.split("<notimestamps>")[1].strip()
|
| 63 |
if task == '<pr>' or task == '<g2p>:
|
| 64 |
pred = pred.replace("/", "")
|
| 65 |
print(pred)
|