ChristophSchuhmann commited on
Commit
edf9a66
Β·
verified Β·
1 Parent(s): fe6fbcc

Re-render results block

Browse files
Files changed (1) hide show
  1. samples/results_block.md +18 -18
samples/results_block.md CHANGED
@@ -57,7 +57,7 @@ _Source dataset: [`mitermix/audioset-with-grounded-captions`](https://huggingfac
57
 
58
  **`laion/sound-effect-captioning-whisper` β€” sound caption:**
59
 
60
- > The audio features a distinct, high-pitched, and somewhat distorted sound. It sounds like a squeaky toy or a similar mechanical sound. The sound is short and repetitive. The audio is a recording of a toy, as indicated by the hint. The squeaky sound suggests the toy is being manipulated, possibly with a squeaky toy or a similar device.
61
 
62
  ---
63
 
@@ -121,7 +121,7 @@ _Source dataset: [`mitermix/audioset-with-grounded-captions`](https://huggingfac
121
 
122
  | # | Label | Confidence |
123
  |---|---|---|
124
- | 1 | `Speech` | 78.0% |
125
  | 2 | `Inside, small room` | 19.2% |
126
  | 3 | `Squish` | 9.1% |
127
 
@@ -129,7 +129,7 @@ _Source dataset: [`mitermix/audioset-with-grounded-captions`](https://huggingfac
129
 
130
  **`laion/sound-effect-captioning-whisper` β€” sound caption:**
131
 
132
- > The audio contains speech. A male voice is speaking, but the content is unintelligible. The speech is somewhat muffled and difficult to understand. The audio contains speech, as indicated by the hint. The muffled quality suggests the speaker is either far away, or the recording quality is poor.
133
 
134
  ---
135
 
@@ -199,7 +199,7 @@ _Source dataset: [`laion/captioned-ai-music-snippets`](https://huggingface.co/da
199
 
200
  **`laion/sound-effect-captioning-whisper` β€” sound caption:**
201
 
202
- > A male voice, perceived as adult, speaks in a clear, measured tone. The speech is articulate and articulate, with a slightly formal timbre. The pace is moderate, and the pitch is in the mid-range. The recording is clean, with minimal background noise. This is a recording of a male speaker delivering a formal speech or narration, possibly in a professional or educational setting. The clear articulation and measured pace suggest a prepared statement or a prepared statement.
203
 
204
  ---
205
 
@@ -269,7 +269,7 @@ _Source dataset: [`TTS-AGI/majestrino-unified-detailed-captions-temporal`](https
269
 
270
  **`laion/sound-effect-captioning-whisper` β€” sound caption:**
271
 
272
- > A female voice, perceived as adult, is speaking in a casual, conversational tone. The speech is clear and easily understandable, with a moderate pace and a neutral emotional state. The audio contains a snippet of a conversation or monologue. The speaker's neutral tone suggests a non-emotional or informative context.
273
 
274
  ---
275
 
@@ -341,7 +341,7 @@ _Source dataset: [`TTS-AGI/majestrino-unified-detailed-captions-temporal`](https
341
 
342
  **`laion/sound-effect-captioning-whisper` β€” sound caption:**
343
 
344
- > The audio contains speech from a female speaker. The speech is clear and understandable, with a moderate pace and a neutral tone. The audio contains speech, as indicated by the hint. The speaker is likely delivering information or engaging in a conversation.
345
 
346
  ---
347
 
@@ -411,7 +411,7 @@ _Source dataset: [`TTS-AGI/majestrino-unified-detailed-captions-temporal`](https
411
 
412
  **`laion/sound-effect-captioning-whisper` β€” sound caption:**
413
 
414
- > The audio contains speech from a male speaker. The speaker is talking at a normal pace and volume. The recording quality is clear, with no noticeable background noise. The audio is a recording of a male speaker, as indicated by the hint. The neutral tone suggests a factual or informative context.
415
 
416
  ---
417
 
@@ -455,7 +455,7 @@ _Source dataset: [`laion/freesound-commercially-permissive-subset-with-captions`
455
 
456
  **`laion/sound-effect-captioning-whisper` β€” sound caption:**
457
 
458
- > The audio captures the sounds of a large vehicle, likely a bus or truck, including engine noise, air brakes, and a distinct whoosh sound, indicating movement. The soundscape suggests a large commercial vehicle, such as a bus or truck, in motion, possibly passing by or idling. The prominent air brake sound is a clear indicator of its operation.
459
 
460
  ---
461
 
@@ -477,7 +477,7 @@ _Source dataset: [`laion/freesound-commercially-permissive-subset-with-captions`
477
 
478
  **`laion/sound-effect-captioning-whisper` β€” sound caption:**
479
 
480
- > The audio features a continuous, high-pitched, and somewhat distorted electronic sound, resembling a siren or a high-frequency alarm. It has a rhythmic, almost pulsating quality, with a distinct, repetitive pattern. The sound is loud and has a somewhat abrasive texture. This sound is characteristic of an alarm or warning signal, possibly from a fire alarm, a security system, or a warning siren. The repetitive and somewhat distorted nature suggests a warning or alert.
481
 
482
  ---
483
 
@@ -499,7 +499,7 @@ _Source dataset: [`laion/freesound-commercially-permissive-subset-with-captions`
499
 
500
  **`laion/sound-effect-captioning-whisper` β€” sound caption:**
501
 
502
- > The audio features a single, clear, and resonant bell sound. The sound is bright and metallic, with a distinct attack and a long, decaying sustain. This is the sound of a bell being struck, likely a small handbell or a chime. It could be used as a notification sound, a signal, or a simple percussive element in music.
503
 
504
  ---
505
 
@@ -513,7 +513,7 @@ _Source dataset: [`laion/freesound-commercially-permissive-subset-with-captions`
513
 
514
  | # | Label | Confidence |
515
  |---|---|---|
516
- | 1 | `Door` | 12.8% |
517
  | 2 | `Silence` | 7.2% |
518
  | 3 | `Thunk` | 5.1% |
519
 
@@ -521,7 +521,7 @@ _Source dataset: [`laion/freesound-commercially-permissive-subset-with-captions`
521
 
522
  **`laion/sound-effect-captioning-whisper` β€” sound caption:**
523
 
524
- > , A sharp, percussive sound, possibly a gunshot or a gunshot, with a distinct metallic resonance.
525
 
526
  ---
527
 
@@ -543,7 +543,7 @@ _Source dataset: [`laion/freesound-commercially-permissive-subset-with-captions`
543
 
544
  **`laion/sound-effect-captioning-whisper` β€” sound caption:**
545
 
546
- > The audio features a distinct, high-pitched squeaking sound, characteristic of a squeaky wheel, followed by a series of softer, repetitive squeaks. These sounds are consistent and rhythmic, suggesting a mechanical source. The sounds are clear and appear to be close to the recording device. The sounds strongly suggest the operation of a mechanical device, specifically a squeaky wheel, possibly a cart, a trolley, or a similar piece of machinery. The repetitive nature of the squeaks indicates continuous motion.
547
 
548
  ---
549
 
@@ -565,7 +565,7 @@ _Source dataset: [`laion/freesound-commercially-permissive-subset-with-captions`
565
 
566
  **`laion/sound-effect-captioning-whisper` β€” sound caption:**
567
 
568
- > The audio features the distinct sound of a squeaky toy, characterized by a high-pitched, repetitive squeak. The squeak is short and sharp, occurring in quick succession. This sound is indicative of a toy being squeezed, likely a rubber or plastic toy, due to the squeaking noise. It suggests a playful or exploratory action.
569
 
570
  ---
571
 
@@ -587,7 +587,7 @@ _Source dataset: [`laion/freesound-commercially-permissive-subset-with-captions`
587
 
588
  **`laion/sound-effect-captioning-whisper` β€” sound caption:**
589
 
590
- > Ambient soundscape featuring a continuous, high-pitched whirring or buzzing, characteristic of a drone. The sound is consistent and suggests the operation of a drone or a similar flying insect. The continuous whirring indicates the drone is in flight, and the subtle variations in pitch and intensity suggest the insect is moving closer to and further away from the recording device.
591
 
592
  ---
593
 
@@ -609,7 +609,7 @@ _Source dataset: [`laion/freesound-commercially-permissive-subset-with-captions`
609
 
610
  **`laion/sound-effect-captioning-whisper` β€” sound caption:**
611
 
612
- > The audio features a variety of bird vocalizations, including chirps, calls, and possibly some high-pitched calls. The sounds are varied in their timbre and rhythm, suggesting multiple birds or a group of birds. This is a recording of birds in their natural habitat, likely a forest, garden, or park. The variety of calls and chirps suggests a diverse bird population, possibly communicating with each other or calling for each other.
613
 
614
  ---
615
 
@@ -631,7 +631,7 @@ _Source dataset: [`laion/freesound-commercially-permissive-subset-with-captions`
631
 
632
  **`laion/sound-effect-captioning-whisper` β€” sound caption:**
633
 
634
- > The audio features a single, distinct, high-pitched electronic beep. The sound is brief and clear, with a consistent tone and no discernible modulation. This is a simple electronic alert or notification sound, commonly used as an indicator or a simple signal.
635
 
636
  ---
637
 
@@ -653,6 +653,6 @@ _Source dataset: [`laion/freesound-commercially-permissive-subset-with-captions`
653
 
654
  **`laion/sound-effect-captioning-whisper` β€” sound caption:**
655
 
656
- > A low, continuous hum with a distinct, rhythmic pulsing or throbbing sound. The sound suggests the operation of a large, powerful machine or vehicle, possibly a train or heavy industrial equipment, characterized by a rhythmic pulsing and a deep, resonant hum.
657
 
658
  ---
 
57
 
58
  **`laion/sound-effect-captioning-whisper` β€” sound caption:**
59
 
60
+ > The audio features a distinct, high-pitched squeaking sound. The squeak is short and sharp, with a slightly metallic quality. The sound is isolated, with no other discernible background noise. The audio is a recording of a squeaky toy being manipulated. The hint confirms the presence of a squeaky toy.
61
 
62
  ---
63
 
 
121
 
122
  | # | Label | Confidence |
123
  |---|---|---|
124
+ | 1 | `Speech` | 78.1% |
125
  | 2 | `Inside, small room` | 19.2% |
126
  | 3 | `Squish` | 9.1% |
127
 
 
129
 
130
  **`laion/sound-effect-captioning-whisper` β€” sound caption:**
131
 
132
+ > The audio contains speech. A male voice is speaking, but the words are unintelligible. The speech is somewhat muffled and difficult to understand. The audio contains speech, as indicated by the hint. The muffled quality suggests the speaker is either far away, speaking through a barrier, or the recording was made at a distance.
133
 
134
  ---
135
 
 
199
 
200
  **`laion/sound-effect-captioning-whisper` β€” sound caption:**
201
 
202
+ > A male voice, perceived as adult, speaks in a clear, measured tone, delivering a narrative or monologue. The speech is articulate and articulate, with a slightly formal timbre. The pace is moderate, and the pitch is in the mid-range. The audio quality is clean, with minimal background noise. This is a recording of a spoken word performance, likely a narration, a monologue, or a formal address. The clear articulation and measured pace suggest a professional or educational context, possibly for an audiobook, documentary, or a documentary.
203
 
204
  ---
205
 
 
269
 
270
  **`laion/sound-effect-captioning-whisper` β€” sound caption:**
271
 
272
+ > A female voice speaks in a frustrated and exasperated tone. The speaker is expressing negative feelings, using a harsh and somewhat sarcastic tone. The audio quality is clear.
273
 
274
  ---
275
 
 
341
 
342
  **`laion/sound-effect-captioning-whisper` β€” sound caption:**
343
 
344
+ > The audio contains speech from a female speaker. The speech is clear and understandable, with a moderate pace and a neutral tone. The audio quality is good, with no noticeable background noise. The audio contains speech, as indicated by the hint. The speaker's voice is clear and understandable.
345
 
346
  ---
347
 
 
411
 
412
  **`laion/sound-effect-captioning-whisper` β€” sound caption:**
413
 
414
+ > The audio contains a male voice speaking. The speech is clear and articulate, with a moderate pace and a neutral tone. The recording quality is good, with minimal background noise. The audio is a recording of a male speaker, likely delivering information or engaging in a conversation. The hint confirms the presence of speech.
415
 
416
  ---
417
 
 
455
 
456
  **`laion/sound-effect-captioning-whisper` β€” sound caption:**
457
 
458
+ > The audio captures the sounds of a large vehicle, likely a bus or truck, including engine noise, air brakes, and the distinct hiss of air brakes. The soundscape suggests an urban or industrial environment, possibly a bus stop or a large commercial vehicle, with the characteristic sounds of its air brakes and the hiss of air brakes.
459
 
460
  ---
461
 
 
477
 
478
  **`laion/sound-effect-captioning-whisper` β€” sound caption:**
479
 
480
+ > The audio features a continuous, high-pitched whirring sound, characteristic of a vacuum cleaner. The sound is consistent and sustained, indicating the operation of a motorized device. There are no other distinct sounds present. This is the sound of a vacuum cleaner in operation. The continuous nature of the sound suggests it is running steadily, likely for cleaning purposes.
481
 
482
  ---
483
 
 
499
 
500
  **`laion/sound-effect-captioning-whisper` β€” sound caption:**
501
 
502
+ > The audio features a high-pitched, sustained electronic tone that gradually fades out. The sound is pure and consistent in its frequency and amplitude, without any discernible modulation or additional elements. This sound is characteristic of a digital alert, a test tone, or a simple electronic signal. It could be used as a simple notification, a system sound, or a component of a larger electronic device.
503
 
504
  ---
505
 
 
513
 
514
  | # | Label | Confidence |
515
  |---|---|---|
516
+ | 1 | `Door` | 12.7% |
517
  | 2 | `Silence` | 7.2% |
518
  | 3 | `Thunk` | 5.1% |
519
 
 
521
 
522
  **`laion/sound-effect-captioning-whisper` β€” sound caption:**
523
 
524
+ > A whoosh sound followed by a metallic clang. This sound suggests a rapid movement of air or an object, immediately followed by a metallic impact, possibly from a projectile hitting metal or a heavy object falling onto a metal surface.
525
 
526
  ---
527
 
 
543
 
544
  **`laion/sound-effect-captioning-whisper` β€” sound caption:**
545
 
546
+ > The audio begins with a distinct mechanical whirring sound, followed by a series of rapid, high-pitched clicks or clacks, and then a final, softer mechanical thud. This sequence repeats multiple times. The sounds suggest the operation of a mechanical device, possibly a printer or a similar office machine, where internal components are moving, engaging, and then settling into place.
547
 
548
  ---
549
 
 
565
 
566
  **`laion/sound-effect-captioning-whisper` β€” sound caption:**
567
 
568
+ > The audio features the distinct sound of a squeaky wheel, accompanied by the rustling of fabric. The squeaky wheel sound is prominent, suggesting movement over a surface. The rustling could be from clothing or paper, and the squeaking might be from a door or a chair.
569
 
570
  ---
571
 
 
587
 
588
  **`laion/sound-effect-captioning-whisper` β€” sound caption:**
589
 
590
+ > The audio captures the distinct sound of a large vehicle, likely a truck, in operation, characterized by its engine noise and the sound of air brakes. The sound suggests the presence of heavy machinery or a large vehicle, possibly in an industrial or transportation context, indicating movement or a busy environment.
591
 
592
  ---
593
 
 
609
 
610
  **`laion/sound-effect-captioning-whisper` β€” sound caption:**
611
 
612
+ > The audio features a variety of bird vocalizations, including chirps, calls, and possibly some squawks. The sounds are varied in pitch and rhythm, suggesting multiple birds or a single bird. This is a recording of birds in their natural environment, likely a garden, park, or forest, where birds are actively communicating. The variety and variety of calls suggest a diverse bird population.
613
 
614
  ---
615
 
 
631
 
632
  **`laion/sound-effect-captioning-whisper` β€” sound caption:**
633
 
634
+ > The audio features a single, distinct, high-pitched electronic beep. The beep is short and sharp, with a clear, electronic timbre. This sound is characteristic of an electronic alert or notification, possibly from a digital device, a timer, or a simple electronic gadget.
635
 
636
  ---
637
 
 
653
 
654
  **`laion/sound-effect-captioning-whisper` β€” sound caption:**
655
 
656
+ > A vehicle passing by, with engine noise and tire sounds, and a distinct whoosh. The audio captures the sound of a vehicle, likely a car or truck, passing by. The engine noise is prominent, indicating it is moving at a moderate speed. The sound includes the distinct whoosh of air as it passes, and the Doppler effect as it moves past the listener.
657
 
658
  ---