echo-dual-encoder-v1 / conditioning_samples.json
ChristophSchuhmann's picture
Upload conditioning_samples.json with huggingface_hub
cc85e82 verified
{
"step": 42800,
"samples": [
{
"step": 42800,
"mode": "procedural_prefix",
"caption": "A medium-quality recording of a female speaker telling a story with a slightly surprised and amused tone. The speaker sounds conversational and relaxed.",
"prefix": "[fast speech, soft disoriented, hearty keen, negative valence, hesitant, excellent background quality, light giggling, excellent content enjoyment, modest surprised, expressive, submissive, definite humiliated, harsh voice, good overall quality, humorous, decent speech quality]",
"transcript": "my god, someone I know did that. Someone I know, I won't say any names, but went to a like two people I know went to an anime con together, and one of them like literally went home without the other one and was like, Oh, I thought you had a ride.",
"adaln_summary": "(all zeros)",
"adaln_nonzero": 0
},
{
"step": 42800,
"mode": "procedural_prefix",
"caption": "A medium-quality recording of a man speaking with a slightly frustrated and incredulous tone. He is recounting a story about someone staying on the table. The recording quality is decent, and there is no background noise.",
"prefix": "[substantial curious fast speech dominant easy angry excellent overall quality hesitant robust irritable apparent dismayed excellent content enjoyment palpable studious]",
"transcript": "Uh, what are we gonna do? She's she's gonna stay on the table for six hours a day. I mean, literally, I mean, they were freaking out, right? And I'm just, and I'm sitting there going, holy shit, how am I gonna do it? Because it's my job to say no, right? Because we've gotten to this point where we h",
"adaln_summary": "(all zeros)",
"adaln_nonzero": 0
},
{
"step": 42800,
"mode": "procedural_prefix",
"caption": "A medium-quality recording of a male speaker giving a speech or presentation. He sounds somewhat enthusiastic and informative. The audio quality is decent, with no noticeable background noise.",
"prefix": "[low-pitched, humorous, bold attentive, clear-ish engrossed, moderate spirited, distinct appreciative, cold, excellent content enjoyment, measured overjoyed, excellent background quality]",
"transcript": "again, those examples there, uh, a lot of stuff that we've talked about today, even off mic. I know, Mark, you were commenting on someone who wasn't involved in the AMP program or a Red One Realty agent that was wanting to come in. You were gracious enough to uh give them some of your time. But the ",
"adaln_summary": "(all zeros)",
"adaln_nonzero": 0
},
{
"step": 42800,
"mode": "static_caption",
"caption": "A medium-quality recording of a male speaker giving a speech or presentation. He sounds somewhat formal and is discussing a topic with a slightly critical tone. The recording quality is decent, and there is no noticeable background noise.",
"prefix": "[The recording quality is decent, and there is no noticeable background noise. He sounds somewhat formal and is discussing a topic with a slightly critical tone. A medium-quality recording of a male speaker giving a speech or presentation]",
"transcript": "\u043e\u0447\u0435\u043d\u044c \u043e\u0431\u0440\u0430\u0437\u043e\u0432\u0430\u043d\u043d\u044b\u0435, \u043e\u0447\u0435\u043d\u044c \u0440\u0430\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u044b\u0435 \u0430\u0433\u0435\u043d\u0442\u044b, \u043a\u043e\u0442\u043e\u0440\u044b\u0435 \u0443\u043c\u0435\u044e\u0442 \u043f\u043e\u0447\u0438\u0442\u0430\u0442\u044c, \u0443\u043c\u0435\u044e\u0442 \u043f\u0440\u0438\u043a\u0438\u043d\u0443\u0442\u044c, \u0447\u0442\u043e-\u0442\u043e \u0442\u0430\u043a\u043e\u0435 \u0440\u0430\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e \u0434\u0435\u0439\u0441\u0442\u0432\u0443\u0435\u0442. \u041e\u043a\u0430\u0437\u044b\u0432\u0430\u0435\u0442\u0441\u044f, \u043d\u0435\u0442. \u041e\u043a\u0430\u0437\u044b\u0432\u0430\u0435\u0442\u0441\u044f, \u0434\u0430\u0436\u0435 \u043d\u0430 \u044d\u0442\u0438\u0445 \u0444\u0438\u043d\u0430\u043d\u0441\u043e\u0432\u044b\u0445 \u0440\u044b\u043d\u043a\u0430\u0445 \u043e\u0447\u0435\u043d\u044c \u0431\u043e\u043b\u044c\u0448\u0443\u044e \u0440\u043e\u043b\u044c \u0438\u0433\u0440\u0430\u044e\u0442 \u044d\u043c\u043e\u0446\u0438\u0438, \u043f\u0440\u0438\u0432\u044b\u0447\u043a\u0438, \u043a\u0430\u043a\u0438\u0435-\u0442\u043e \u0441\u0442\u0440\u0430\u0445\u0438. \u0412\u043e\u0442 \u0431\u044b\u043b \u0442\u0430\u043a\u043e\u0439 \u044d\u043a\u0441\u043f\u0435\u0440\u0438\u043c\u0435\u043d\u0442, \u043a\u043e\u0442\u043e\u0440\u044b\u0439 \u043f\u0440\u043e\u0432\u043e\u0434\u0438\u043b \u0414\u0438\u043c\u0430 \u0420\u0435\u043f\u0438",
"adaln_summary": "(all zeros)",
"adaln_nonzero": 0
},
{
"step": 42800,
"mode": "static_caption",
"caption": "A medium-quality recording of a male speaker discussing music and how it makes him feel. He sounds thoughtful and slightly enthusiastic. The recording quality is decent, and there is no background noise.",
"prefix": "[He sounds thoughtful and slightly enthusiastic. A medium-quality recording of a male speaker discussing music and how it makes him feel. The recording quality is decent, and there is no background noise]",
"transcript": "Whatever impression it made on me when when we covered Rocket Summer last time hasn't stuck in my mind in a way that really like in I guess endears me to the band enough to think that the album before this the one we covered, this one that we're about to cover, is so good that a band that I'm fairly",
"adaln_summary": "(all zeros)",
"adaln_nonzero": 0
},
{
"step": 42800,
"mode": "static_caption",
"caption": "A high-quality recording of a conversation between two people, a man and a woman. The man sounds enthusiastic and slightly surprised, while the woman sounds neutral. The conversation is about a church and a family.",
"prefix": "[The man sounds enthusiastic and slightly surprised, while the woman sounds neutral. A high-quality recording of a conversation between two people, a man and a woman. The conversation is about a church and a family]",
"transcript": "Our season I want to dive into the lives of many of our congregants, people who've been part of the Bell Air Church family. The church family's been around for nearly 70 years now, but you've called Bel Air Church your your church home for the last two years. Can you believe it's already been two ye",
"adaln_summary": "(all zeros)",
"adaln_nonzero": 0
},
{
"step": 42800,
"mode": "adaln_only",
"caption": "A medium-quality recording of a male speaker discussing his perspective on a garden. He sounds thoughtful and slightly enthusiastic. The recording quality is decent, with no noticeable background noise.",
"prefix": "",
"transcript": "You know, what meanwhile, I I like by the time you have one good looking garden, I've already had two, baby. I've already had two. Third one on the way. You know? Um, and like the colors are just and and the the vibrancy and the just the diversity, and we're all about diversity here. Look at this po",
"adaln_summary": "Amusement=0.62, Elation=0.77, Pleasure_Ecstasy=0.75, Contentment=0.86, Thankfulness_Gratitude=0.04, Infatuation=-0.03, Hope_Enthusiasm_Optimism=1.34, Triumph=0.74, Pride=0.61, Interest=2.54, Concentration=1.30, Contemplation=1.02, Relief=0.04, Longing=0.14, Teasing=0.05",
"adaln_nonzero": 40
},
{
"step": 42800,
"mode": "adaln_only",
"caption": "A high-quality recording of a conversation between two people, a male and a female. The female speaker is introducing a topic, and the male speaker is enthusiastic and interested. The recording quality is very high, and there is no background noise.",
"prefix": "",
"transcript": "everyone, I'm Sarah Davis. I am a GP from Cardiff.",
"adaln_summary": "Amusement=0.15, Elation=1.36, Pleasure_Ecstasy=0.86, Contentment=0.98, Thankfulness_Gratitude=0.83, Affection=0.55, Infatuation=-0.02, Hope_Enthusiasm_Optimism=1.88, Triumph=-0.01, Pride=0.40, Interest=2.46, Concentration=1.43, Contemplation=0.03, Relief=0.05, Sexual_Lust=0.02",
"adaln_nonzero": 38
},
{
"step": 42800,
"mode": "adaln_only",
"caption": "A medium-quality recording of a male speaker discussing business growth and leeg procedural. The speaker sounds informative and slightly enthusiastic. The recording quality is decent, with no noticeable background noise.",
"prefix": "",
"transcript": "increase from where we ended last year to where we're gonna begin this year is great. And then we've studied other kind of leagues like USL, where we're we're kind of under the covers with them a little bit on their numbers and sharing some things there. And we're right where we should be from a lea",
"adaln_summary": "Amusement=0.08, Elation=1.54, Pleasure_Ecstasy=0.82, Contentment=1.01, Thankfulness_Gratitude=0.79, Hope_Enthusiasm_Optimism=2.23, Triumph=0.42, Pride=0.78, Interest=2.63, Concentration=1.64, Contemplation=0.83, Relief=0.84, Impatience_and_Irritability=0.07, Sexual_Lust=0.02, Fear=-0.01",
"adaln_nonzero": 36
},
{
"step": 42800,
"mode": "baseline",
"caption": "A medium-quality recording of a male speaker talking in Russian. He sounds somewhat analytical and is explaining something. The recording quality is decent, and there is no background noise.",
"prefix": "",
"transcript": "\u042f \u043f\u043e\u043d\u0438\u043c\u0430\u044e, \u0447\u0442\u043e \u0432\u0441\u0435\u043c \u0445\u043e\u0447\u0435\u0442\u0441\u044f, \u0434\u0430, \u044f \u043f\u043e\u043d\u0438\u043c\u0430\u044e, \u0447\u0442\u043e \u0442\u0443\u0442 \u0442\u0430\u043a\u0430\u044f \u0440\u0435\u0430\u043b\u044c\u043d\u043e \u043f\u043e\u043f\u044b\u0442\u043a\u0430 \u0443\u0445\u0432\u0430\u0442\u0438\u0442\u044c\u0441\u044f. \u041f\u0440\u0438\u0447\u0435\u043c \u043d\u0435 \u0442\u043e\u043b\u044c\u043a\u043e \u0443 \u0428\u0432\u0430\u0440\u0446\u0435\u043d\u0435\u0433\u0433\u0435\u0440\u0430. \u0422\u043e \u0435\u0441\u0442\u044c, \u043c\u043d\u0435 \u043a\u0430\u0436\u0435\u0442\u0441\u044f, \u0447\u0442\u043e, \u043c\u043e\u0436\u0435\u0442 \u0431\u044b\u0442\u044c, \u043e\u043d \u0438 \u0435\u0433\u043e \u0435\u0449\u0435 \u043f\u0440\u0438\u0448\u043b\u043e\u0441\u044c \u0443\u043b\u0430\u043c\u044b\u0432\u0430\u0442\u044c, \u043d\u0430 \u0441\u0430\u043c\u043e\u043c \u0434\u0435\u043b\u0435, \u043d\u0430 \u044d\u0442\u0443 \u0440\u043e\u043b\u044c. \u0410 \u043f\u043e\u043f\u044b\u0442\u043a\u0430 \u0441\u043a\u043e\u0440\u0435\u0435 \u043f\u0440\u043e\u0434\u044e\u0441\u0435\u0440\u043e\u0432. \u0422\u0438\u043f\u0430, \u043d\u0430\u043c \u043d\u0430\u0434\u043e \u043a\u0430\u043a-\u0442\u043e, \u043a\u043e\u0440\u043e\u0447\u0435, \u0432\u043e\u0442 \u043d\u0430\u0434\u043e \u043a\u0430\u043a-\u0442\u043e \u0438\u043d\u0442\u0435",
"adaln_summary": "(all zeros)",
"adaln_nonzero": 0
},
{
"step": 42800,
"mode": "baseline",
"caption": "A medium-quality recording of a female speaker telling a story with a slightly humorous and exasperated tone. The speaker sounds like a young adult. There are no sound effects or music.",
"prefix": "",
"transcript": "my wrist were fused under my skin. And I was like, uh I would talk to my mom and I was like, Mom, what do I do about this? And she was like, Oh, well, my mom's from England. So she's like, Oh, Marco, we'll just take him out like we did in the old days. And I'm like, All right, let's give it a try. A",
"adaln_summary": "(all zeros)",
"adaln_nonzero": 0
},
{
"step": 42800,
"mode": "baseline",
"caption": "A medium-quality recording of a male speaker, who sounds slightly annoyed and frustrated. He is talking about a game and using a slightly sarcastic tone. There is some background noise.",
"prefix": "",
"transcript": "",
"adaln_summary": "(all zeros)",
"adaln_nonzero": 0
},
{
"step": 42800,
"mode": "dual",
"caption": "A high-quality recording of a conversation between two people, likely adults, discussing a hypothetical scenario involving robots. The speakers sound thoughtful and slightly concerned. The audio quality is excellent, with no background noise.",
"prefix": "[low-pitched, easy intent, dominant, soft voice, vivid riveted, soft briny, excellent overall quality, excellent content enjoyment, evident fidgety, warm, moderate pace, clear-ish irate]",
"transcript": "them both, you would pay them both the same, right? No, I do not believe so. And if so that's the danger of these things. If if no. If you treat your neon digital assistant doodle thingy the same as you treat a human, that's just ridiculous. Like I I'm not If robots Robots are created in service of ",
"adaln_summary": "Elation=0.01, Thankfulness_Gratitude=0.06, Interest=2.04, Astonishment_Surprise=0.35, Concentration=1.55, Contemplation=0.91, Relief=0.03, Teasing=0.66, Impatience_and_Irritability=1.51, Sexual_Lust=-0.05, Doubt=0.64, Fear=0.01, Distress=0.15, Confusion=0.03, Embarrassment=0.43",
"adaln_nonzero": 45
},
{
"step": 42800,
"mode": "dual",
"caption": "A medium-quality recording of a female speaker explaining a concept with a slightly frustrated tone. The speaker sounds like a young adult. There is no background noise.",
"prefix": "[low-pitched notable interested fast speech hesitant dominant shadow provocative muted meditative expressive easy intent warm easy second-guessing]",
"transcript": "Genau. When jemand geld for mir will, kann ich einen QR-Code machen, weil er bei einer anderen Bank is. Ich kann das scannen anders \u00fcberweisen. Aber prinzipiell, da l\u00e4uft der QR-Code schon auch so, dass dann halt einfach die \u00fcberweisungsinformationen in die Felder ausgef\u00fcllt werden. Aber du musst tr",
"adaln_summary": "Amusement=0.31, Elation=0.02, Contentment=0.58, Thankfulness_Gratitude=0.14, Affection=0.07, Infatuation=-0.02, Hope_Enthusiasm_Optimism=0.85, Triumph=-0.01, Interest=2.39, Concentration=1.82, Contemplation=0.85, Relief=0.09, Teasing=0.95, Impatience_and_Irritability=0.71, Sexual_Lust=-0.05",
"adaln_nonzero": 45
},
{
"step": 42800,
"mode": "dual",
"caption": "A medium-quality recording of a male speaker discussing a topic with a slightly conversational tone. The speaker sounds somewhat interested and thoughtful. There are no sound effects or music.",
"prefix": "[decisive riveted, excellent content enjoyment, expressive, present motivated, tinge bummed, modest diligent, negative valence, definite pondering, low-pitched]",
"transcript": "can't it depends on what they are because some of them are so specialized. Some of these like very specialized top level domains, they can go for like a thousand not a thousand. Yeah, some of them can four or five hundred dollars a year. Um and that's usually what's called like the green rush or you",
"adaln_summary": "Elation=0.37, Pleasure_Ecstasy=0.15, Contentment=0.01, Thankfulness_Gratitude=0.05, Infatuation=-0.01, Hope_Enthusiasm_Optimism=1.14, Triumph=-0.02, Interest=2.54, Concentration=1.61, Contemplation=1.05, Relief=0.01, Longing=-0.02, Impatience_and_Irritability=0.47, Doubt=0.41, Shame=0.01",
"adaln_nonzero": 36
}
]
}