gluten_v1 / README.md

Update README.md

4f9d85b verified 15 days ago

6.73 kB

	---
	tags:
	- audio
	- trap
	- hip-hop
	- music
	- text-to-audio
	- generative-audio
	- stable-audio
	- stable-audio-open
	- finetune
	language:
	- en
	base_model: stabilityai/stable-audio-open-1.0
	base_model_relation: finetune
	pipeline_tag: text-to-audio
	license: other
	license_name: stabilityai-community-license
	license_link: https://stability.ai/license
	datasets:
	- custom
	model_name: Gluten – Trap, Hip-Hop & Pop Finetune
	---
	# <GLUTEN>


	![image](https://cdn-uploads.huggingface.co/production/uploads/67e48dcf55a1ff318bab67a8/KuSk2epliP6y_ZgxC7Wi6.png)

	A fine-tuned version of Stable Audio Open focused on generating drumless, short musical loops, which seemlessly loop, drum-loops and one-shot sample-style clips, with strong conditioning on BPM, key, and mood metadata.
	It is the base-version of the sample-generation model used in [Kurt](https://trykurt.com)

	> Best at: creating sample-ready loops in Trap / Hip-Hop / Pop (especially melodic trap-style loops).
	> Conditioning: BPM-aligned, key-aligned, mood-aligned to the extent the training labels and model generalization allow.

	## Model Details
	- Model type: Text-conditioned generative audio model (fine-tune of Stable Audio Open)
	- Base model: `stabilityai/stable-audio-open-1.0`
	- Domain: Music loops (Trap / Hip-Hop / Pop)
	- Primary use: Generating loopable musical ideas and samples
	- Languages: English prompt metadata (structured)

	### Prompt / Conditioning Format
	Use the following structured format (recommended):

	`Format: Solo \| Genre: Trap \| Sub-Genre: Melodic Trap \| Instruments: Piano, Synth Pad \| Moods: Melancholic, Reflective \| Styles: Catchy, Smooth \| Tempo: Medium \| BPM: 135 \| Key: Dm`

	*Make sure to calculate the time-duration in seconds according to your input-BPM and 4/4-beat for best results ie. seconds = (bars 240) / BPM**

	Field guidance
	- Format: e.g., `Solo`, `Full`, `Loop`, etc. (use what matches your dataset convention)
	- Genre/Sub-Genre: e.g., Trap / Melodic Trap, Hip-Hop / Boom Bap, Pop / Electropop
	- Instruments: comma-separated, e.g., `Piano, 808, Synth Lead`
	- Moods/Styles: comma-separated descriptors
	- Tempo: e.g., `Slow`, `Medium`, `Fast` (even if BPM is provided this can help to inforce the mood)
	- BPM: integer (e.g., 135)
	- Key: e.g., `Dm`, `F#min`, `Cmaj`
	- Length: seconds = (bars * 240) / BPM


	### Example Prompts
	All created with CFG = 9
	- 'Format: Solo \| Genre: Trap \| Sub-Genre: Trap \| Instruments: Synth Pad, Synth Lead \| Moods: Epic, Dark \| Styles: Atmospheric, Building \| Tempo: Mid \| BPM: 140 \| Key: Em,'
	<audio controls src="https://cdn-uploads.huggingface.co/production/uploads/67e48dcf55a1ff318bab67a8/UTYyRKt7tSnNEyAeKDJUO.wav"></audio>
	- 'Format: Solo \| Genre: Trap \| Sub-Genre: Melodic Trap \| Instruments: Piano, Synth Pad \| Moods: Melancholic, Reflective \| Styles: Catchy, Smooth \| Tempo: Medium \| BPM: 135 \| Key: Dm,'
	<audio controls src="https://cdn-uploads.huggingface.co/production/uploads/67e48dcf55a1ff318bab67a8/BUse6gYfdm18jt7IE5diA.wav"></audio>
	- 'Format: Solo \| Genre: Trap \| Sub-Genre: Melodic Trap, Wavy Trap \| Instruments: Strings, Ambient Pads \| Moods: Smooth \| Styles: Ethereal, Catchy \| Tempo: Medium \| BPM: 142 \| Key: F#m'
	<audio controls src="https://cdn-uploads.huggingface.co/production/uploads/67e48dcf55a1ff318bab67a8/rT2O008Zt2XKQTRqB6mWS.wav"></audio>
	- 'Format: Solo \| Genre: Trap \| Sub-Genre: Ambient \| Instruments: Synth Pad, 808 Bass, Bells \| Moods: Sad, Ethereal \| Styles: Atmospheric, Melodic \| Tempo: Slow \| BPM: 140 \| Key: Em'
	<audio controls src="https://cdn-uploads.huggingface.co/production/uploads/67e48dcf55a1ff318bab67a8/RVBgJgyrOq9hwrJK2Iol_.wav"></audio>
	- 'Format: Solo \|Sub-Genre: Dark Trap \| Instruments: 808, Hi-Hats, Snare \| Moods: Heavy, Driving \| Styles: Punchy, Rhythmic \| Tempo: Medium \| BPM: 140 \| Key: C# Minor'
	<audio controls src="https://cdn-uploads.huggingface.co/production/uploads/67e48dcf55a1ff318bab67a8/WSPYZb2R8KpUD7KfRYcXJ.wav"></audio>

	---

	## Training
	### Training Setup
	- Hardware: 8× NVIDIA RTX 4090
	- Epochs: 20
	- Objective: Fine-tune Stable Audio Open to better follow structured metadata for loop generation in Trap / Hip-Hop / Pop.


	### Data
	- Dataset size: ~80,000 loop clips
	- Content: Trap, Hip-Hop, Pop loops (music-only or drums-only)
	- Strength: high consistency in short musical phrases / loop structure

	#### Labeling / Metadata
	Labeling and metadata generation were derived using:
	- Qwen Omni (for multi-modal/semantic labeling)
	- MuQ / MuLan-based audio-text embedding alignment (for music semantic tags)
	-

	These labels were used to create structured conditioning fields (genre, sub-genre, instruments, mood/style descriptors, BPM, key).


	---

	## Intended Use
	### Primary Intended Use Cases
	- Producing loopable musical ideas for Trap/Hip-Hop/Pop
	- Generating sample packs / sample candidates
	- Rapid prototyping of melodic loops, chord beds, simple motifs, textures

	### Out-of-Scope / Not Recommended
	- Full-length song generation (beyond the clip duration supported by the base model/tools)
	- Vocals/lyrical content reliability


	---

	## Limitations
	- BPM/Key compliance is not guaranteed: Even with conditioning, outputs can drift. Please ensure the prompted length aligns with the BPM and bars.
	- Genre bias: Tuned for Trap/Hip-Hop/Pop loops; other genres may sound less consistent.
	- Prompt sensitivity: Best results come from the provided structured format. Use a LLM to convert natural language prompts to this format.

	---

	## Recommendations for Best Results
	- Keep prompts structured and specific (instruments + moods + BPM + key).
	- If BPM/key are critical:
	- Generate multiple candidates and select the closest match.
	- Use light post-processing for perfect grid/key lock.
	- Use “Format: Solo” everytime

	---

	### Stability AI Community License
	This model is a fine-tuned version of the Stable Audio Open 1.0 model by Stability AI. It is licensed under the Stability AI Community License, which allows:

	Free use for research and non-commercial purposes.
	Limited Commercial use for entities with annual revenues below USD $1M.
	An enterprise license is required for commercial use by entities with annual revenues exceeding USD $1M.
	For detailed terms, refer to the original LICENSE.

	Additional Terms for Fine-Tuned Model
	As per Stability AI, as a fine-tuner, no upfront payment is required under this license. However, it is crucial that if this model is used in any commercial application in which annual revenue exceeds USD $1M, you must contact Stability AI for enterprise licensing. Further details can be found here.
	}