neurlang
/

ipa-whisper-small

Automatic Speech Recognition

Eval Results (legacy)

Model card Files Files and versions

neurlang commited on Aug 7, 2025

Commit

415b3ea

·

verified ·

1 Parent(s): acaff0a

Timestamps info

Files changed (1) hide show

README.md +3 -4

README.md CHANGED Viewed

@@ -278,13 +278,12 @@ can be run with batched inference. It can also be extended to predict sequence l
 >>> sample = ds[0]["audio"]
 >>> prediction = pipe(sample.copy(), batch_size=8)["text"]
-"mˈɪstɚ kwˈɪltɚ ˈɪz ðə ˈeɪ pˈɑsəl ˈʌv ðə ˈmɪdəl klˈæsɪz ˈænd wˈɪɹ glæd tˈu ˈælkəm ˈhɪz gˈʌsbəl"
 >>> # we can also return timestamps for the predictions
->>> prediction = pipe(sample.copy(), batch_size=8, return_timestamps=True)["chunks"]
-Whisper did not predict an ending timestamp, which can happen if audio is cut off in the middle of a word. Also make sure WhisperTimeStampLogitsProcessor was used during generation.
 ```
 Refer to the blog post [ASR Chunking](https://huggingface.co/blog/asr-chunking) for more details on the chunking algorithm.

 >>> sample = ds[0]["audio"]
 >>> prediction = pipe(sample.copy(), batch_size=8)["text"]
+"mˈɪstɚ kwˈɪltɚ ˈɪz ðɪ əpˈɑsəl əv ðə ˈmɪdəl klˈæsɪz ˈænd wˈɪɹ glˈæd tˈɪ wˈɛlkəm ˈhɪz gˈɑspəl"
 >>> # we can also return timestamps for the predictions
+>>> prediction = pipe(sample.copy(), batch_size=8, return_timestamps="word")["chunks"]
+[{'text': 'mˈɪstɚ', 'timestamp': (0.42, 0.78)}, {'text': ' kwˈɪltɚ', 'timestamp': (0.78, 1.2)}, {'text': ' ˈɪz', 'timestamp': (1.2, 1.4)}, {'text': ' ðɪ', 'timestamp': (1.4, 1.52)}, {'text': ' əpˈɑsəl', 'timestamp': (1.52, 2.08)}, {'text': ' əv', 'timestamp': (2.08, 2.26)}, {'text': ' ðə', 'timestamp': (2.26, 2.36)}, {'text': ' ˈmɪdəl', 'timestamp': (2.36, 2.6)}, {'text': ' klˈæsɪz', 'timestamp': (2.6, 3.22)}, {'text': ' ˈænd', 'timestamp': (3.22, 3.42)}, {'text': ' wˈɪɹ', 'timestamp': (3.42, 3.66)}, {'text': ' glˈæd', 'timestamp': (3.66, 4.02)}, {'text': ' tˈɪ', 'timestamp': (4.02, 4.18)}, {'text': ' wˈɛlkəm', 'timestamp': (4.18, 4.58)}, {'text': ' ˈhɪz', 'timestamp': (4.58, 4.82)}, {'text': ' gˈɑspəl', 'timestamp': (4.82, 5.38)}]
 ```
 Refer to the blog post [ASR Chunking](https://huggingface.co/blog/asr-chunking) for more details on the chunking algorithm.