RoyalCities commited on
Commit
da1c8ad
·
verified ·
1 Parent(s): 2048f05

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +19 -40
README.md CHANGED
@@ -37,8 +37,7 @@ The result is a model built for actual production workflows: **tempo-synced, key
37
  Foundation-1 is designed for **pure sample generation**. It excels at generating coherent musical loops that stay locked to tempo and phrase length while allowing layered prompting across instrument families, timbre descriptors, FX, and notation-driven musical behavior.
38
 
39
  ---
40
-
41
- ## What Foundation-1 Does
42
 
43
  - **Generates musically coherent loops** for production workflows
44
  - **Understands BPM and bar count** for structured loop generation
@@ -52,8 +51,7 @@ Foundation-1 is designed for **pure sample generation**. It excels at generating
52
  - **Understands Wet vs Dry production context** — adding terms like *Dry* encourages minimal FX processing, while *Wet* or FX tags produce more processed, spatial, or effected sounds.
53
 
54
  ---
55
-
56
- ## Why It Feels Different
57
 
58
  Most audio models can react to broad prompt terms like “warm pad” or “bright synth.” with inconsistent results. Foundation-1 was designed to go further by treating the sound as a layered system:
59
 
@@ -66,8 +64,7 @@ Most audio models can react to broad prompt terms like “warm pad” or “brig
66
  This layered conditioning approach is a major reason Foundation-1 is able to deliver both **high musicality** and **high prompt control** at the same time.
67
 
68
  ---
69
-
70
- ## Audio Showcase
71
 
72
  <div style="text-align: center; margin: 20px 0;">
73
  <table style="width: 100%; border-collapse: collapse; margin: 0 auto;">
@@ -171,8 +168,7 @@ This layered conditioning approach is a major reason Foundation-1 is able to del
171
  </div>
172
 
173
  ---
174
-
175
- ## Core Capabilities
176
 
177
  ### 1. Musical Structure
178
  Foundation-1 was trained to produce structured musical material rather than full music or generic textures. Musical Notation terms can encourage notation, chord progressions, melodies, arps, phrase direction, rhythmic density, and other musically relevant behaviors.
@@ -193,8 +189,7 @@ The model supports a dedicated FX layer covering multiple forms of reverb, delay
193
  Foundation-1 is built for **production-ready loop generation**, including BPM-aware and bar-aware structure within supported denominations.
194
 
195
  ---
196
-
197
- ## Conditioning Architecture
198
 
199
  Foundation-1 was trained with a layered tagging hierarchy designed to improve control, composability, and prompt clarity.
200
 
@@ -209,8 +204,7 @@ Foundation-1 was trained with a layered tagging hierarchy designed to improve co
209
  This makes it possible to prompt at different levels of abstraction. A user can stay broad with a family-level prompt like **Synth** or **Keys**, or get more specific with terms like **Synth Lead**, **Wavetable Bass**, **Grand Piano**, **Violin**, or **Trumpet**, then further shape the output using timbral and FX descriptors.
210
 
211
  ---
212
-
213
- ## Instrument Coverage
214
 
215
  ### Major Families
216
 
@@ -272,8 +266,7 @@ Foundation-1 includes a wide sub-family layer covering a broad range of producti
272
  <center><img src="./Charts/subfamilites_pie.PNG" alt="Sub-Family Chart" width="80%"></center>
273
 
274
  ---
275
-
276
- ## Timbre System
277
 
278
  One of Foundation-1’s main strengths is that it was not trained to treat timbre as an afterthought. Timbral character is directly represented in the prompt system, giving users control over not only *what* is being generated, but also *how it sounds*.
279
 
@@ -323,7 +316,7 @@ Representative timbre descriptors include:
323
 
324
  <center><img src="./Charts/timbre_tags_pie.PNG" alt="Timbre Chart" width="80%"></center>
325
 
326
- ### Why This Matters
327
 
328
  This tagging design makes prompts much more flexible. Instead of only asking for an instrument, users can shape:
329
  - tonal balance
@@ -340,8 +333,7 @@ This is especially useful for producers who want to guide the output toward a sp
340
  For a list of used tags please see the **[Tag Reference Sheet](./Master_Tag_Reference.md)**.
341
 
342
  ---
343
-
344
- ## FX Layer
345
 
346
  Foundation-1 includes a dedicated FX descriptor layer spanning multiple common production effects.
347
 
@@ -371,8 +363,7 @@ Representative FX tags include:
371
  <center><img src="./Charts/fx_pie.PNG" alt="FX Chart" width="80%"></center>
372
 
373
  ---
374
-
375
- ## Musical Notation and Structure
376
 
377
  Foundation-1 was trained with structured musical descriptors designed to improve phrase coherence, rhythmic intent, melodic motion, and prompt control.
378
 
@@ -411,8 +402,7 @@ Examples of supported structural ideas may include terms such as:
411
  This notation layer is one of the main reasons Foundation-1 produces unusually coherent musical material instead of static or loosely related phrases. These can be mixed and matched as desired.
412
 
413
  ---
414
-
415
- ## Tonal and Timing Support
416
 
417
  Foundation-1 is designed for structured music production workflows and supports:
418
 
@@ -427,9 +417,7 @@ Foundation-1 is designed for structured music production workflows and supports:
427
  - Supported BPM denominations: **100 BPM, 110 BPM, 120 BPM, 128 BPM, 130 BPM, 140 BPM, 150 BPM**
428
 
429
  ---
430
-
431
-
432
- ## Prompt Structure
433
 
434
  For best results, use **rich prompts built around the model’s tags**. These tags can be mixed and matched as needed. The model was trained on a structured hierarchy designed to encourage musically coherent sample generation.
435
 
@@ -449,8 +437,7 @@ For best results, use **rich prompts built around the model’s tags**. These ta
449
  Use **FX and timbre tags sparingly at first**, then layer more once you understand the model’s behavior.
450
 
451
  ---
452
-
453
- ## One Prompt → Multiple Outputs
454
 
455
  Each row below uses the **exact same prompt**, but a different random seed.
456
  The **timbre tags remain unchanged**, so the overall sound character stays consistent while the **melodic and musical content varies** between generations.
@@ -548,9 +535,7 @@ The **timbre tags remain unchanged**, so the overall sound character stays consi
548
  </div>
549
 
550
  ---
551
-
552
-
553
- ## Recommended Workflow
554
 
555
  Foundation-1 is best used with the **RC Stable Audio Fork**, which is tuned around this model’s metadata and prompting structure.
556
 
@@ -600,8 +585,7 @@ Generation speed will vary depending on GPU model and system configuration.
600
  On an **RTX 3090**, generation time is approximately **~7–8 seconds per sample**.
601
 
602
  ---
603
-
604
- ## Dataset and Training Philosophy
605
 
606
  Foundation-1 was built around a **structured sample-generation philosophy**, rather than generic or genre-based audio captioning. The dataset consists entirely of **hand-crafted and labeled audio**, produced through a controlled augmentation pipeline.
607
 
@@ -620,8 +604,7 @@ This design is central to the model’s **musical coherence and high degree of s
620
  For more details on the dataset and training methodology, see the **[Training & Dataset Notes](./training_dataset_info.md)**.
621
 
622
  ---
623
-
624
- ## Limitations
625
 
626
  Foundation-1 is a specialized model for **music sample generation**, not a general-purpose music generator.
627
 
@@ -646,23 +629,19 @@ If the generation duration is shorter than the musical structure implied by the
646
  The **RC Stable Audio Fork automatically handles this timing alignment**, making this workflow much easier.
647
 
648
  ---
649
-
650
-
651
- ## License
652
 
653
  This model is licensed under the Stability AI Community License. It is available for non-commercial use or limited commercial use by entities with annual revenues below USD $1M. For revenues exceeding USD $1M, please refer to the repository license file for full terms.
654
 
655
  ---
656
-
657
- ### Companion Video
658
 
659
  Further information on the model and design philosophy can be found in the companion video:
660
 
661
  🎥 **[Watch the Foundation-1 overview and design philosophy video](https://www.youtube.com/watch?v=O2iBBWeWaL8)**
662
 
663
  ---
664
-
665
- ## Final Notes
666
 
667
  Foundation-1 is intended as a **producer-facing foundation model for structured sample generation**, designed to augment music production rather than replace it.
668
 
 
37
  Foundation-1 is designed for **pure sample generation**. It excels at generating coherent musical loops that stay locked to tempo and phrase length while allowing layered prompting across instrument families, timbre descriptors, FX, and notation-driven musical behavior.
38
 
39
  ---
40
+ <h2 align="center">What Foundation-1 Does</h2>
 
41
 
42
  - **Generates musically coherent loops** for production workflows
43
  - **Understands BPM and bar count** for structured loop generation
 
51
  - **Understands Wet vs Dry production context** — adding terms like *Dry* encourages minimal FX processing, while *Wet* or FX tags produce more processed, spatial, or effected sounds.
52
 
53
  ---
54
+ <h2 align="center">Why It Feels Different</h2>
 
55
 
56
  Most audio models can react to broad prompt terms like “warm pad” or “bright synth.” with inconsistent results. Foundation-1 was designed to go further by treating the sound as a layered system:
57
 
 
64
  This layered conditioning approach is a major reason Foundation-1 is able to deliver both **high musicality** and **high prompt control** at the same time.
65
 
66
  ---
67
+ <h2 align="center">Audio Showcase</h2>
 
68
 
69
  <div style="text-align: center; margin: 20px 0;">
70
  <table style="width: 100%; border-collapse: collapse; margin: 0 auto;">
 
168
  </div>
169
 
170
  ---
171
+ <h2 align="center">Core Capabilities</h2>
 
172
 
173
  ### 1. Musical Structure
174
  Foundation-1 was trained to produce structured musical material rather than full music or generic textures. Musical Notation terms can encourage notation, chord progressions, melodies, arps, phrase direction, rhythmic density, and other musically relevant behaviors.
 
189
  Foundation-1 is built for **production-ready loop generation**, including BPM-aware and bar-aware structure within supported denominations.
190
 
191
  ---
192
+ <h2 align="center">Conditioning Architecture</h2>
 
193
 
194
  Foundation-1 was trained with a layered tagging hierarchy designed to improve control, composability, and prompt clarity.
195
 
 
204
  This makes it possible to prompt at different levels of abstraction. A user can stay broad with a family-level prompt like **Synth** or **Keys**, or get more specific with terms like **Synth Lead**, **Wavetable Bass**, **Grand Piano**, **Violin**, or **Trumpet**, then further shape the output using timbral and FX descriptors.
205
 
206
  ---
207
+ <h2 align="center">Instrument Coverage</h2>
 
208
 
209
  ### Major Families
210
 
 
266
  <center><img src="./Charts/subfamilites_pie.PNG" alt="Sub-Family Chart" width="80%"></center>
267
 
268
  ---
269
+ <h2 align="center">Timbre System</h2>
 
270
 
271
  One of Foundation-1’s main strengths is that it was not trained to treat timbre as an afterthought. Timbral character is directly represented in the prompt system, giving users control over not only *what* is being generated, but also *how it sounds*.
272
 
 
316
 
317
  <center><img src="./Charts/timbre_tags_pie.PNG" alt="Timbre Chart" width="80%"></center>
318
 
319
+ <h2 align="center">Why This Matters</h2>
320
 
321
  This tagging design makes prompts much more flexible. Instead of only asking for an instrument, users can shape:
322
  - tonal balance
 
333
  For a list of used tags please see the **[Tag Reference Sheet](./Master_Tag_Reference.md)**.
334
 
335
  ---
336
+ <h2 align="center">FX Layer</h2>
 
337
 
338
  Foundation-1 includes a dedicated FX descriptor layer spanning multiple common production effects.
339
 
 
363
  <center><img src="./Charts/fx_pie.PNG" alt="FX Chart" width="80%"></center>
364
 
365
  ---
366
+ <h2 align="center">Musical Notation and Structure</h2>
 
367
 
368
  Foundation-1 was trained with structured musical descriptors designed to improve phrase coherence, rhythmic intent, melodic motion, and prompt control.
369
 
 
402
  This notation layer is one of the main reasons Foundation-1 produces unusually coherent musical material instead of static or loosely related phrases. These can be mixed and matched as desired.
403
 
404
  ---
405
+ <h2 align="center">Tonal and Timing Support</h2>
 
406
 
407
  Foundation-1 is designed for structured music production workflows and supports:
408
 
 
417
  - Supported BPM denominations: **100 BPM, 110 BPM, 120 BPM, 128 BPM, 130 BPM, 140 BPM, 150 BPM**
418
 
419
  ---
420
+ <h2 align="center">Prompt Structure</h2>
 
 
421
 
422
  For best results, use **rich prompts built around the model’s tags**. These tags can be mixed and matched as needed. The model was trained on a structured hierarchy designed to encourage musically coherent sample generation.
423
 
 
437
  Use **FX and timbre tags sparingly at first**, then layer more once you understand the model’s behavior.
438
 
439
  ---
440
+ <h2 align="center">One Prompt → Multiple Outputs</h2>
 
441
 
442
  Each row below uses the **exact same prompt**, but a different random seed.
443
  The **timbre tags remain unchanged**, so the overall sound character stays consistent while the **melodic and musical content varies** between generations.
 
535
  </div>
536
 
537
  ---
538
+ <h2 align="center">Recommended Workflow</h2>
 
 
539
 
540
  Foundation-1 is best used with the **RC Stable Audio Fork**, which is tuned around this model’s metadata and prompting structure.
541
 
 
585
  On an **RTX 3090**, generation time is approximately **~7–8 seconds per sample**.
586
 
587
  ---
588
+ <h2 align="center">Dataset and Training Philosophy</h2>
 
589
 
590
  Foundation-1 was built around a **structured sample-generation philosophy**, rather than generic or genre-based audio captioning. The dataset consists entirely of **hand-crafted and labeled audio**, produced through a controlled augmentation pipeline.
591
 
 
604
  For more details on the dataset and training methodology, see the **[Training & Dataset Notes](./training_dataset_info.md)**.
605
 
606
  ---
607
+ <h2 align="center">Limitations</h2>
 
608
 
609
  Foundation-1 is a specialized model for **music sample generation**, not a general-purpose music generator.
610
 
 
629
  The **RC Stable Audio Fork automatically handles this timing alignment**, making this workflow much easier.
630
 
631
  ---
632
+ <h2 align="center">License</h2>
 
 
633
 
634
  This model is licensed under the Stability AI Community License. It is available for non-commercial use or limited commercial use by entities with annual revenues below USD $1M. For revenues exceeding USD $1M, please refer to the repository license file for full terms.
635
 
636
  ---
637
+ <h3 align="center">Companion Video</h3>
 
638
 
639
  Further information on the model and design philosophy can be found in the companion video:
640
 
641
  🎥 **[Watch the Foundation-1 overview and design philosophy video](https://www.youtube.com/watch?v=O2iBBWeWaL8)**
642
 
643
  ---
644
+ <h2 align="center">Final Notes</h2>
 
645
 
646
  Foundation-1 is intended as a **producer-facing foundation model for structured sample generation**, designed to augment music production rather than replace it.
647