|
|
--- |
|
|
tags: |
|
|
- audio |
|
|
- trap |
|
|
- hip-hop |
|
|
- music |
|
|
- text-to-audio |
|
|
- generative-audio |
|
|
- stable-audio |
|
|
- stable-audio-open |
|
|
- finetune |
|
|
language: |
|
|
- en |
|
|
base_model: stabilityai/stable-audio-open-1.0 |
|
|
base_model_relation: finetune |
|
|
pipeline_tag: text-to-audio |
|
|
license: other |
|
|
license_name: stabilityai-community-license |
|
|
license_link: https://stability.ai/license |
|
|
datasets: |
|
|
- custom |
|
|
model_name: Gluten – Trap, Hip-Hop & Pop Finetune |
|
|
--- |
|
|
# <GLUTEN> |
|
|
|
|
|
|
|
|
 |
|
|
|
|
|
A fine-tuned version of **Stable Audio Open** focused on generating **drumless, short musical loops, which seemlessly loop**, **drum-loops** and **one-shot sample-style clips**, with strong conditioning on **BPM**, **key**, and **mood** metadata. |
|
|
It is the base-version of the sample-generation model used in [Kurt](https://trykurt.com) |
|
|
|
|
|
> **Best at:** creating sample-ready loops in **Trap / Hip-Hop / Pop** (especially melodic trap-style loops). |
|
|
> **Conditioning:** BPM-aligned, key-aligned, mood-aligned *to the extent the training labels and model generalization allow*. |
|
|
|
|
|
## Model Details |
|
|
- **Model type:** Text-conditioned generative audio model (fine-tune of Stable Audio Open) |
|
|
- **Base model:** `stabilityai/stable-audio-open-1.0` |
|
|
- **Domain:** Music loops (Trap / Hip-Hop / Pop) |
|
|
- **Primary use:** Generating loopable musical ideas and samples |
|
|
- **Languages:** English prompt metadata (structured) |
|
|
|
|
|
### Prompt / Conditioning Format |
|
|
Use the following structured format (recommended): |
|
|
|
|
|
`Format: Solo | Genre: Trap | Sub-Genre: Melodic Trap | Instruments: Piano, Synth Pad | Moods: Melancholic, Reflective | Styles: Catchy, Smooth | Tempo: Medium | BPM: 135 | Key: Dm` |
|
|
|
|
|
**Make sure to calculate the time-duration in seconds according to your input-BPM and 4/4-beat for best results ie. seconds = (bars * 240) / BPM** |
|
|
|
|
|
**Field guidance** |
|
|
- **Format:** e.g., `Solo`, `Full`, `Loop`, etc. (use what matches your dataset convention) |
|
|
- **Genre/Sub-Genre:** e.g., Trap / Melodic Trap, Hip-Hop / Boom Bap, Pop / Electropop |
|
|
- **Instruments:** comma-separated, e.g., `Piano, 808, Synth Lead` |
|
|
- **Moods/Styles:** comma-separated descriptors |
|
|
- **Tempo:** e.g., `Slow`, `Medium`, `Fast` (even if BPM is provided this can help to inforce the mood) |
|
|
- **BPM:** integer (e.g., 135) |
|
|
- **Key:** e.g., `Dm`, `F#min`, `Cmaj` |
|
|
- **Length:** seconds = (bars * 240) / BPM |
|
|
|
|
|
|
|
|
### Example Prompts |
|
|
All created with CFG = 9 |
|
|
- 'Format: Solo | Genre: Trap | Sub-Genre: Trap | Instruments: Synth Pad, Synth Lead | Moods: Epic, Dark | Styles: Atmospheric, Building | Tempo: Mid | BPM: 140 | Key: Em,' |
|
|
<audio controls src="https://cdn-uploads.huggingface.co/production/uploads/67e48dcf55a1ff318bab67a8/UTYyRKt7tSnNEyAeKDJUO.wav"></audio> |
|
|
- 'Format: Solo | Genre: Trap | Sub-Genre: Melodic Trap | Instruments: Piano, Synth Pad | Moods: Melancholic, Reflective | Styles: Catchy, Smooth | Tempo: Medium | BPM: 135 | Key: Dm,' |
|
|
<audio controls src="https://cdn-uploads.huggingface.co/production/uploads/67e48dcf55a1ff318bab67a8/BUse6gYfdm18jt7IE5diA.wav"></audio> |
|
|
- 'Format: Solo | Genre: Trap | Sub-Genre: Melodic Trap, Wavy Trap | Instruments: Strings, Ambient Pads | Moods: Smooth | Styles: Ethereal, Catchy | Tempo: Medium | BPM: 142 | Key: F#m' |
|
|
<audio controls src="https://cdn-uploads.huggingface.co/production/uploads/67e48dcf55a1ff318bab67a8/rT2O008Zt2XKQTRqB6mWS.wav"></audio> |
|
|
- 'Format: Solo | Genre: Trap | Sub-Genre: Ambient | Instruments: Synth Pad, 808 Bass, Bells | Moods: Sad, Ethereal | Styles: Atmospheric, Melodic | Tempo: Slow | BPM: 140 | Key: Em' |
|
|
<audio controls src="https://cdn-uploads.huggingface.co/production/uploads/67e48dcf55a1ff318bab67a8/RVBgJgyrOq9hwrJK2Iol_.wav"></audio> |
|
|
- 'Format: Solo |Sub-Genre: Dark Trap | Instruments: 808, Hi-Hats, Snare | Moods: Heavy, Driving | Styles: Punchy, Rhythmic | Tempo: Medium | BPM: 140 | Key: C# Minor' |
|
|
<audio controls src="https://cdn-uploads.huggingface.co/production/uploads/67e48dcf55a1ff318bab67a8/WSPYZb2R8KpUD7KfRYcXJ.wav"></audio> |
|
|
|
|
|
--- |
|
|
|
|
|
## Training |
|
|
### Training Setup |
|
|
- **Hardware:** 8× NVIDIA RTX 4090 |
|
|
- **Epochs:** 20 |
|
|
- **Objective:** Fine-tune Stable Audio Open to better follow structured metadata for loop generation in Trap / Hip-Hop / Pop. |
|
|
|
|
|
|
|
|
### Data |
|
|
- **Dataset size:** ~80,000 loop clips |
|
|
- **Content:** Trap, Hip-Hop, Pop loops (music-only or drums-only) |
|
|
- **Strength:** high consistency in short musical phrases / loop structure |
|
|
|
|
|
#### Labeling / Metadata |
|
|
Labeling and metadata generation were derived using: |
|
|
- **Qwen Omni** (for multi-modal/semantic labeling) |
|
|
- **MuQ / MuLan-based audio-text embedding alignment** (for music semantic tags) |
|
|
- |
|
|
|
|
|
These labels were used to create structured conditioning fields (genre, sub-genre, instruments, mood/style descriptors, BPM, key). |
|
|
|
|
|
|
|
|
--- |
|
|
|
|
|
## Intended Use |
|
|
### Primary Intended Use Cases |
|
|
- Producing **loopable musical ideas** for Trap/Hip-Hop/Pop |
|
|
- Generating **sample packs / sample candidates** |
|
|
- Rapid prototyping of **melodic loops**, chord beds, simple motifs, textures |
|
|
|
|
|
### Out-of-Scope / Not Recommended |
|
|
- Full-length song generation (beyond the clip duration supported by the base model/tools) |
|
|
- Vocals/lyrical content reliability |
|
|
|
|
|
|
|
|
--- |
|
|
|
|
|
## Limitations |
|
|
- **BPM/Key compliance is not guaranteed:** Even with conditioning, outputs can drift. Please ensure the prompted length aligns with the BPM and bars. |
|
|
- **Genre bias:** Tuned for Trap/Hip-Hop/Pop loops; other genres may sound less consistent. |
|
|
- **Prompt sensitivity:** Best results come from the provided structured format. Use a LLM to convert natural language prompts to this format. |
|
|
|
|
|
--- |
|
|
|
|
|
## Recommendations for Best Results |
|
|
- Keep prompts **structured** and **specific** (instruments + moods + BPM + key). |
|
|
- If BPM/key are critical: |
|
|
- Generate multiple candidates and select the closest match. |
|
|
- Use light post-processing for perfect grid/key lock. |
|
|
- Use “Format: Solo” everytime |
|
|
|
|
|
--- |
|
|
|
|
|
### Stability AI Community License |
|
|
This model is a fine-tuned version of the Stable Audio Open 1.0 model by Stability AI. It is licensed under the Stability AI Community License, which allows: |
|
|
|
|
|
Free use for research and non-commercial purposes. |
|
|
Limited Commercial use for entities with annual revenues below USD $1M. |
|
|
An enterprise license is required for commercial use by entities with annual revenues exceeding USD $1M. |
|
|
For detailed terms, refer to the original LICENSE. |
|
|
|
|
|
Additional Terms for Fine-Tuned Model |
|
|
As per Stability AI, as a fine-tuner, no upfront payment is required under this license. However, it is crucial that if this model is used in any commercial application in which annual revenue exceeds USD $1M, you must contact Stability AI for enterprise licensing. Further details can be found here. |
|
|
} |