gluten_v1 / README.md
atoof's picture
Update README.md
4f9d85b verified
metadata
tags:
  - audio
  - trap
  - hip-hop
  - music
  - text-to-audio
  - generative-audio
  - stable-audio
  - stable-audio-open
  - finetune
language:
  - en
base_model: stabilityai/stable-audio-open-1.0
base_model_relation: finetune
pipeline_tag: text-to-audio
license: other
license_name: stabilityai-community-license
license_link: https://stability.ai/license
datasets:
  - custom
model_name: Gluten  Trap, Hip-Hop & Pop Finetune

image

A fine-tuned version of Stable Audio Open focused on generating drumless, short musical loops, which seemlessly loop, drum-loops and one-shot sample-style clips, with strong conditioning on BPM, key, and mood metadata. It is the base-version of the sample-generation model used in Kurt

Best at: creating sample-ready loops in Trap / Hip-Hop / Pop (especially melodic trap-style loops).
Conditioning: BPM-aligned, key-aligned, mood-aligned to the extent the training labels and model generalization allow.

Model Details

  • Model type: Text-conditioned generative audio model (fine-tune of Stable Audio Open)
  • Base model: stabilityai/stable-audio-open-1.0
  • Domain: Music loops (Trap / Hip-Hop / Pop)
  • Primary use: Generating loopable musical ideas and samples
  • Languages: English prompt metadata (structured)

Prompt / Conditioning Format

Use the following structured format (recommended):

Format: Solo | Genre: Trap | Sub-Genre: Melodic Trap | Instruments: Piano, Synth Pad | Moods: Melancholic, Reflective | Styles: Catchy, Smooth | Tempo: Medium | BPM: 135 | Key: Dm

Make sure to calculate the time-duration in seconds according to your input-BPM and 4/4-beat for best results ie. seconds = (bars * 240) / BPM

Field guidance

  • Format: e.g., Solo, Full, Loop, etc. (use what matches your dataset convention)
  • Genre/Sub-Genre: e.g., Trap / Melodic Trap, Hip-Hop / Boom Bap, Pop / Electropop
  • Instruments: comma-separated, e.g., Piano, 808, Synth Lead
  • Moods/Styles: comma-separated descriptors
  • Tempo: e.g., Slow, Medium, Fast (even if BPM is provided this can help to inforce the mood)
  • BPM: integer (e.g., 135)
  • Key: e.g., Dm, F#min, Cmaj
  • Length: seconds = (bars * 240) / BPM

Example Prompts

All created with CFG = 9

  • 'Format: Solo | Genre: Trap | Sub-Genre: Trap | Instruments: Synth Pad, Synth Lead | Moods: Epic, Dark | Styles: Atmospheric, Building | Tempo: Mid | BPM: 140 | Key: Em,'
  • 'Format: Solo | Genre: Trap | Sub-Genre: Melodic Trap | Instruments: Piano, Synth Pad | Moods: Melancholic, Reflective | Styles: Catchy, Smooth | Tempo: Medium | BPM: 135 | Key: Dm,'
  • 'Format: Solo | Genre: Trap | Sub-Genre: Melodic Trap, Wavy Trap | Instruments: Strings, Ambient Pads | Moods: Smooth | Styles: Ethereal, Catchy | Tempo: Medium | BPM: 142 | Key: F#m'
  • 'Format: Solo | Genre: Trap | Sub-Genre: Ambient | Instruments: Synth Pad, 808 Bass, Bells | Moods: Sad, Ethereal | Styles: Atmospheric, Melodic | Tempo: Slow | BPM: 140 | Key: Em'
  • 'Format: Solo |Sub-Genre: Dark Trap | Instruments: 808, Hi-Hats, Snare | Moods: Heavy, Driving | Styles: Punchy, Rhythmic | Tempo: Medium | BPM: 140 | Key: C# Minor'

Training

Training Setup

  • Hardware: 8× NVIDIA RTX 4090
  • Epochs: 20
  • Objective: Fine-tune Stable Audio Open to better follow structured metadata for loop generation in Trap / Hip-Hop / Pop.

Data

  • Dataset size: ~80,000 loop clips
  • Content: Trap, Hip-Hop, Pop loops (music-only or drums-only)
  • Strength: high consistency in short musical phrases / loop structure

Labeling / Metadata

Labeling and metadata generation were derived using: - Qwen Omni (for multi-modal/semantic labeling) - MuQ / MuLan-based audio-text embedding alignment (for music semantic tags)

These labels were used to create structured conditioning fields (genre, sub-genre, instruments, mood/style descriptors, BPM, key).


Intended Use

Primary Intended Use Cases

  • Producing loopable musical ideas for Trap/Hip-Hop/Pop
  • Generating sample packs / sample candidates
  • Rapid prototyping of melodic loops, chord beds, simple motifs, textures

Out-of-Scope / Not Recommended

  • Full-length song generation (beyond the clip duration supported by the base model/tools)
  • Vocals/lyrical content reliability

Limitations

  • BPM/Key compliance is not guaranteed: Even with conditioning, outputs can drift. Please ensure the prompted length aligns with the BPM and bars.
  • Genre bias: Tuned for Trap/Hip-Hop/Pop loops; other genres may sound less consistent.
  • Prompt sensitivity: Best results come from the provided structured format. Use a LLM to convert natural language prompts to this format.

Recommendations for Best Results

  • Keep prompts structured and specific (instruments + moods + BPM + key).
  • If BPM/key are critical:
    • Generate multiple candidates and select the closest match.
    • Use light post-processing for perfect grid/key lock.
  • Use “Format: Solo” everytime

Stability AI Community License

This model is a fine-tuned version of the Stable Audio Open 1.0 model by Stability AI. It is licensed under the Stability AI Community License, which allows:

Free use for research and non-commercial purposes. Limited Commercial use for entities with annual revenues below USD $1M. An enterprise license is required for commercial use by entities with annual revenues exceeding USD $1M. For detailed terms, refer to the original LICENSE.

Additional Terms for Fine-Tuned Model As per Stability AI, as a fine-tuner, no upfront payment is required under this license. However, it is crucial that if this model is used in any commercial application in which annual revenue exceeds USD $1M, you must contact Stability AI for enterprise licensing. Further details can be found here. }