Karaoke Songs Long
A dataset of long-form karaoke instrumental tracks with auto-generated captions describing the musical characteristics of each song. This repository contains audio-to-text caption pairs for long-duration karaoke versions of popular songs.
What it is
This dataset provides instrumental karaoke tracks (long versions โ extended or full-length) paired with descriptive captions. Each caption details the genre, tempo (BPM), instrumentation, mood, and potential use cases of the audio.
Dataset Contents
The captions.csv file contains the following fields:
| Field | Description |
|---|---|
original_filename |
Original audio filename (artist - song title - karaoke version) |
cleaned_filename |
Cleaned/standardized filename |
caption |
Auto-generated textual description of the audio track |
Included Artists (sampling)
- 50 Cent โ "In Da Club" (Karaoke)
- 5 Seconds of Summer โ "Amnesia", "Beside You", "Don't Stop", "Good Girls", "Heartbreak Girl", "She's Kinda Hot", "Youngblood"
- A Great Big World & Christina Aguilera โ "Say Something"
- Adele โ "Chasing Pavements", "Don't You Remember", "Easy On Me"
- (and many more)
Caption Features
Each caption includes information about:
- Genre: e.g., pop rock, hip hop, indie rock, folk, electronic, soul/R&B
- Tempo: BPM range (e.g., 72 BPM, 84 BPM, 120 BPM, 160 BPM)
- Time Signature: typically 4/4
- Instrumentation: guitars, bass, drums, synths, piano, etc.
- Mood/Atmosphere: energetic, reflective, calming, introspective, uplifting
- Use Cases: background music, workout, documentaries, soundtracks, commercials
Usage
This dataset is suitable for:
- Audio captioning and music description models
- Text-to-audio generation training
- Music information retrieval (MIR) research
- Karaoke applications and instrument separation
- Audio retrieval from natural language queries
License
Please check the individual song copyrights and terms. This repository contains descriptive captions only; actual audio files (if present) are instrumental karaoke tracks and may be subject to additional licensing.
Citation
@misc{edwixx-karaoke_songs_long,
author = {Anurag Kanade},
title = {karaoke_songs_long},
year = {2026},
publisher = {Hugging Face},
journal = {Hugging Face Hub},
howpublished = {\url{https://huggingface.co/edwixx/karaoke_songs_long}}
}