Re-caption sfx samples with updated sound-effect-captioning-whisper
Browse files
README.md
CHANGED
|
@@ -275,7 +275,7 @@ _Source dataset: [`mitermix/audioset-with-grounded-captions`](https://huggingfac
|
|
| 275 |
|
| 276 |
**`laion/sound-effect-captioning-whisper` β sound caption:**
|
| 277 |
|
| 278 |
-
> The audio features a distinct, high-pitched
|
| 279 |
|
| 280 |
---
|
| 281 |
|
|
@@ -339,7 +339,7 @@ _Source dataset: [`mitermix/audioset-with-grounded-captions`](https://huggingfac
|
|
| 339 |
|
| 340 |
| # | Label | Confidence |
|
| 341 |
|---|---|---|
|
| 342 |
-
| 1 | `Speech` | 78.
|
| 343 |
| 2 | `Inside, small room` | 19.2% |
|
| 344 |
| 3 | `Squish` | 9.1% |
|
| 345 |
|
|
@@ -347,7 +347,7 @@ _Source dataset: [`mitermix/audioset-with-grounded-captions`](https://huggingfac
|
|
| 347 |
|
| 348 |
**`laion/sound-effect-captioning-whisper` β sound caption:**
|
| 349 |
|
| 350 |
-
> The audio contains speech. A male voice is speaking, but the
|
| 351 |
|
| 352 |
---
|
| 353 |
|
|
@@ -417,7 +417,7 @@ _Source dataset: [`laion/captioned-ai-music-snippets`](https://huggingface.co/da
|
|
| 417 |
|
| 418 |
**`laion/sound-effect-captioning-whisper` β sound caption:**
|
| 419 |
|
| 420 |
-
> A male voice, perceived as adult, speaks in a clear, measured tone. The speech is articulate and articulate, with a slightly formal timbre. The pace is moderate, and the pitch is in the mid-range. The
|
| 421 |
|
| 422 |
---
|
| 423 |
|
|
@@ -487,7 +487,7 @@ _Source dataset: [`TTS-AGI/majestrino-unified-detailed-captions-temporal`](https
|
|
| 487 |
|
| 488 |
**`laion/sound-effect-captioning-whisper` β sound caption:**
|
| 489 |
|
| 490 |
-
> A female voice
|
| 491 |
|
| 492 |
---
|
| 493 |
|
|
@@ -559,7 +559,7 @@ _Source dataset: [`TTS-AGI/majestrino-unified-detailed-captions-temporal`](https
|
|
| 559 |
|
| 560 |
**`laion/sound-effect-captioning-whisper` β sound caption:**
|
| 561 |
|
| 562 |
-
> The audio contains speech from a female speaker. The speech is clear and understandable, with a moderate pace and a neutral tone. The audio contains speech, as indicated by the hint. The speaker is
|
| 563 |
|
| 564 |
---
|
| 565 |
|
|
@@ -629,7 +629,7 @@ _Source dataset: [`TTS-AGI/majestrino-unified-detailed-captions-temporal`](https
|
|
| 629 |
|
| 630 |
**`laion/sound-effect-captioning-whisper` β sound caption:**
|
| 631 |
|
| 632 |
-
> The audio contains
|
| 633 |
|
| 634 |
---
|
| 635 |
|
|
@@ -673,7 +673,7 @@ _Source dataset: [`laion/freesound-commercially-permissive-subset-with-captions`
|
|
| 673 |
|
| 674 |
**`laion/sound-effect-captioning-whisper` β sound caption:**
|
| 675 |
|
| 676 |
-
> The audio captures the sounds of a large vehicle, likely a bus or truck, including engine noise, air brakes, and
|
| 677 |
|
| 678 |
---
|
| 679 |
|
|
@@ -695,7 +695,7 @@ _Source dataset: [`laion/freesound-commercially-permissive-subset-with-captions`
|
|
| 695 |
|
| 696 |
**`laion/sound-effect-captioning-whisper` β sound caption:**
|
| 697 |
|
| 698 |
-
> The audio features a continuous, high-pitched
|
| 699 |
|
| 700 |
---
|
| 701 |
|
|
@@ -717,7 +717,7 @@ _Source dataset: [`laion/freesound-commercially-permissive-subset-with-captions`
|
|
| 717 |
|
| 718 |
**`laion/sound-effect-captioning-whisper` β sound caption:**
|
| 719 |
|
| 720 |
-
> The audio features a
|
| 721 |
|
| 722 |
---
|
| 723 |
|
|
@@ -731,7 +731,7 @@ _Source dataset: [`laion/freesound-commercially-permissive-subset-with-captions`
|
|
| 731 |
|
| 732 |
| # | Label | Confidence |
|
| 733 |
|---|---|---|
|
| 734 |
-
| 1 | `Door` | 12.
|
| 735 |
| 2 | `Silence` | 7.2% |
|
| 736 |
| 3 | `Thunk` | 5.1% |
|
| 737 |
|
|
@@ -739,7 +739,7 @@ _Source dataset: [`laion/freesound-commercially-permissive-subset-with-captions`
|
|
| 739 |
|
| 740 |
**`laion/sound-effect-captioning-whisper` β sound caption:**
|
| 741 |
|
| 742 |
-
>
|
| 743 |
|
| 744 |
---
|
| 745 |
|
|
@@ -761,7 +761,7 @@ _Source dataset: [`laion/freesound-commercially-permissive-subset-with-captions`
|
|
| 761 |
|
| 762 |
**`laion/sound-effect-captioning-whisper` β sound caption:**
|
| 763 |
|
| 764 |
-
> The audio
|
| 765 |
|
| 766 |
---
|
| 767 |
|
|
@@ -783,7 +783,7 @@ _Source dataset: [`laion/freesound-commercially-permissive-subset-with-captions`
|
|
| 783 |
|
| 784 |
**`laion/sound-effect-captioning-whisper` β sound caption:**
|
| 785 |
|
| 786 |
-
> The audio features the distinct sound of a squeaky
|
| 787 |
|
| 788 |
---
|
| 789 |
|
|
@@ -805,7 +805,7 @@ _Source dataset: [`laion/freesound-commercially-permissive-subset-with-captions`
|
|
| 805 |
|
| 806 |
**`laion/sound-effect-captioning-whisper` β sound caption:**
|
| 807 |
|
| 808 |
-
>
|
| 809 |
|
| 810 |
---
|
| 811 |
|
|
@@ -827,7 +827,7 @@ _Source dataset: [`laion/freesound-commercially-permissive-subset-with-captions`
|
|
| 827 |
|
| 828 |
**`laion/sound-effect-captioning-whisper` β sound caption:**
|
| 829 |
|
| 830 |
-
> The audio features a variety of bird vocalizations, including chirps, calls, and possibly some
|
| 831 |
|
| 832 |
---
|
| 833 |
|
|
@@ -849,7 +849,7 @@ _Source dataset: [`laion/freesound-commercially-permissive-subset-with-captions`
|
|
| 849 |
|
| 850 |
**`laion/sound-effect-captioning-whisper` β sound caption:**
|
| 851 |
|
| 852 |
-
> The audio features a single, distinct, high-pitched electronic beep. The
|
| 853 |
|
| 854 |
---
|
| 855 |
|
|
@@ -871,7 +871,7 @@ _Source dataset: [`laion/freesound-commercially-permissive-subset-with-captions`
|
|
| 871 |
|
| 872 |
**`laion/sound-effect-captioning-whisper` β sound caption:**
|
| 873 |
|
| 874 |
-
> A
|
| 875 |
|
| 876 |
---
|
| 877 |
|
|
|
|
| 275 |
|
| 276 |
**`laion/sound-effect-captioning-whisper` β sound caption:**
|
| 277 |
|
| 278 |
+
> The audio features a distinct, high-pitched squeaking sound. The squeak is short and sharp, with a slightly metallic quality. The sound is isolated, with no other discernible background noise. The audio is a recording of a squeaky toy being manipulated. The hint confirms the presence of a squeaky toy.
|
| 279 |
|
| 280 |
---
|
| 281 |
|
|
|
|
| 339 |
|
| 340 |
| # | Label | Confidence |
|
| 341 |
|---|---|---|
|
| 342 |
+
| 1 | `Speech` | 78.1% |
|
| 343 |
| 2 | `Inside, small room` | 19.2% |
|
| 344 |
| 3 | `Squish` | 9.1% |
|
| 345 |
|
|
|
|
| 347 |
|
| 348 |
**`laion/sound-effect-captioning-whisper` β sound caption:**
|
| 349 |
|
| 350 |
+
> The audio contains speech. A male voice is speaking, but the words are unintelligible. The speech is somewhat muffled and difficult to understand. The audio contains speech, as indicated by the hint. The muffled quality suggests the speaker is either far away, speaking through a barrier, or the recording was made at a distance.
|
| 351 |
|
| 352 |
---
|
| 353 |
|
|
|
|
| 417 |
|
| 418 |
**`laion/sound-effect-captioning-whisper` β sound caption:**
|
| 419 |
|
| 420 |
+
> A male voice, perceived as adult, speaks in a clear, measured tone, delivering a narrative or monologue. The speech is articulate and articulate, with a slightly formal timbre. The pace is moderate, and the pitch is in the mid-range. The audio quality is clean, with minimal background noise. This is a recording of a spoken word performance, likely a narration, a monologue, or a formal address. The clear articulation and measured pace suggest a professional or educational context, possibly for an audiobook, documentary, or a documentary.
|
| 421 |
|
| 422 |
---
|
| 423 |
|
|
|
|
| 487 |
|
| 488 |
**`laion/sound-effect-captioning-whisper` β sound caption:**
|
| 489 |
|
| 490 |
+
> A female voice speaks in a frustrated and exasperated tone. The speaker is expressing negative feelings, using a harsh and somewhat sarcastic tone. The audio quality is clear.
|
| 491 |
|
| 492 |
---
|
| 493 |
|
|
|
|
| 559 |
|
| 560 |
**`laion/sound-effect-captioning-whisper` β sound caption:**
|
| 561 |
|
| 562 |
+
> The audio contains speech from a female speaker. The speech is clear and understandable, with a moderate pace and a neutral tone. The audio quality is good, with no noticeable background noise. The audio contains speech, as indicated by the hint. The speaker's voice is clear and understandable.
|
| 563 |
|
| 564 |
---
|
| 565 |
|
|
|
|
| 629 |
|
| 630 |
**`laion/sound-effect-captioning-whisper` β sound caption:**
|
| 631 |
|
| 632 |
+
> The audio contains a male voice speaking. The speech is clear and articulate, with a moderate pace and a neutral tone. The recording quality is good, with minimal background noise. The audio is a recording of a male speaker, likely delivering information or engaging in a conversation. The hint confirms the presence of speech.
|
| 633 |
|
| 634 |
---
|
| 635 |
|
|
|
|
| 673 |
|
| 674 |
**`laion/sound-effect-captioning-whisper` β sound caption:**
|
| 675 |
|
| 676 |
+
> The audio captures the sounds of a large vehicle, likely a bus or truck, including engine noise, air brakes, and the distinct hiss of air brakes. The soundscape suggests an urban or industrial environment, possibly a bus stop or a large commercial vehicle, with the characteristic sounds of its air brakes and the hiss of air brakes.
|
| 677 |
|
| 678 |
---
|
| 679 |
|
|
|
|
| 695 |
|
| 696 |
**`laion/sound-effect-captioning-whisper` β sound caption:**
|
| 697 |
|
| 698 |
+
> The audio features a continuous, high-pitched whirring sound, characteristic of a vacuum cleaner. The sound is consistent and sustained, indicating the operation of a motorized device. There are no other distinct sounds present. This is the sound of a vacuum cleaner in operation. The continuous nature of the sound suggests it is running steadily, likely for cleaning purposes.
|
| 699 |
|
| 700 |
---
|
| 701 |
|
|
|
|
| 717 |
|
| 718 |
**`laion/sound-effect-captioning-whisper` β sound caption:**
|
| 719 |
|
| 720 |
+
> The audio features a high-pitched, sustained electronic tone that gradually fades out. The sound is pure and consistent in its frequency and amplitude, without any discernible modulation or additional elements. This sound is characteristic of a digital alert, a test tone, or a simple electronic signal. It could be used as a simple notification, a system sound, or a component of a larger electronic device.
|
| 721 |
|
| 722 |
---
|
| 723 |
|
|
|
|
| 731 |
|
| 732 |
| # | Label | Confidence |
|
| 733 |
|---|---|---|
|
| 734 |
+
| 1 | `Door` | 12.7% |
|
| 735 |
| 2 | `Silence` | 7.2% |
|
| 736 |
| 3 | `Thunk` | 5.1% |
|
| 737 |
|
|
|
|
| 739 |
|
| 740 |
**`laion/sound-effect-captioning-whisper` β sound caption:**
|
| 741 |
|
| 742 |
+
> A whoosh sound followed by a metallic clang. This sound suggests a rapid movement of air or an object, immediately followed by a metallic impact, possibly from a projectile hitting metal or a heavy object falling onto a metal surface.
|
| 743 |
|
| 744 |
---
|
| 745 |
|
|
|
|
| 761 |
|
| 762 |
**`laion/sound-effect-captioning-whisper` β sound caption:**
|
| 763 |
|
| 764 |
+
> The audio begins with a distinct mechanical whirring sound, followed by a series of rapid, high-pitched clicks or clacks, and then a final, softer mechanical thud. This sequence repeats multiple times. The sounds suggest the operation of a mechanical device, possibly a printer or a similar office machine, where internal components are moving, engaging, and then settling into place.
|
| 765 |
|
| 766 |
---
|
| 767 |
|
|
|
|
| 783 |
|
| 784 |
**`laion/sound-effect-captioning-whisper` β sound caption:**
|
| 785 |
|
| 786 |
+
> The audio features the distinct sound of a squeaky wheel, accompanied by the rustling of fabric. The squeaky wheel sound is prominent, suggesting movement over a surface. The rustling could be from clothing or paper, and the squeaking might be from a door or a chair.
|
| 787 |
|
| 788 |
---
|
| 789 |
|
|
|
|
| 805 |
|
| 806 |
**`laion/sound-effect-captioning-whisper` β sound caption:**
|
| 807 |
|
| 808 |
+
> The audio captures the distinct sound of a large vehicle, likely a truck, in operation, characterized by its engine noise and the sound of air brakes. The sound suggests the presence of heavy machinery or a large vehicle, possibly in an industrial or transportation context, indicating movement or a busy environment.
|
| 809 |
|
| 810 |
---
|
| 811 |
|
|
|
|
| 827 |
|
| 828 |
**`laion/sound-effect-captioning-whisper` β sound caption:**
|
| 829 |
|
| 830 |
+
> The audio features a variety of bird vocalizations, including chirps, calls, and possibly some squawks. The sounds are varied in pitch and rhythm, suggesting multiple birds or a single bird. This is a recording of birds in their natural environment, likely a garden, park, or forest, where birds are actively communicating. The variety and variety of calls suggest a diverse bird population.
|
| 831 |
|
| 832 |
---
|
| 833 |
|
|
|
|
| 849 |
|
| 850 |
**`laion/sound-effect-captioning-whisper` β sound caption:**
|
| 851 |
|
| 852 |
+
> The audio features a single, distinct, high-pitched electronic beep. The beep is short and sharp, with a clear, electronic timbre. This sound is characteristic of an electronic alert or notification, possibly from a digital device, a timer, or a simple electronic gadget.
|
| 853 |
|
| 854 |
---
|
| 855 |
|
|
|
|
| 871 |
|
| 872 |
**`laion/sound-effect-captioning-whisper` β sound caption:**
|
| 873 |
|
| 874 |
+
> A vehicle passing by, with engine noise and tire sounds, and a distinct whoosh. The audio captures the sound of a vehicle, likely a car or truck, passing by. The engine noise is prominent, indicating it is moving at a moderate speed. The sound includes the distinct whoosh of air as it passes, and the Doppler effect as it moves past the listener.
|
| 875 |
|
| 876 |
---
|
| 877 |
|