File size: 4,323 Bytes
8463d89 2e0512a 8463d89 2e0512a 8463d89 2e0512a 8463d89 2e0512a 59acd42 8463d89 59acd42 2e0512a 8463d89 2e0512a 8463d89 cc47de7 8463d89 cc47de7 8463d89 b521330 8463d89 2e0512a b521330 8463d89 2e0512a 8463d89 2e0512a 8463d89 2e0512a 8463d89 59acd42 8463d89 cc47de7 8463d89 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 | ---
language:
- en
license: other
license_name: earthlyframes-collaborative-intelligence-license
license_link: https://github.com/brotherclone/white/blob/main/COLLABORATIVE_INTELLIGENCE_LICENSE.md
pipeline_tag: audio-classification
tags:
- audio
- music
- onnx
- chromatic
- rainbow-table
base_model:
- laion/larger_clap_music
- microsoft/deberta-v3-base
---
# Refractor CDM
**Refractor CDM** (Compact Disc Module) is a lightweight MLP calibration head that classifies full-mix audio recordings into one of nine "rainbow colors" — a chromatic taxonomy used in *The Rainbow Table*, an AI-assisted album series.
The CDM is a companion to the base Refractor ONNX model (a multimodal fusion network trained on short catalog segments). The base model works well for MIDI and short audio clips but predicts poorly on full-mix audio because CLAP embeddings are optimized for short segments. The CDM corrects this by training directly on chunked full-mix audio.
## Model Details
| Property | Value |
|---|---|
| Architecture | 2-layer MLP (256 → 128 → 9) |
| Parameters | 361,993 |
| Input | CLAP audio (512-dim) + DeBERTa concept (768-dim) = 1280-dim |
| Output | Softmax probabilities over 9 colors (`color_probs`, shape `[batch, 9]`) |
| Format | ONNX (`refractor_cdm.onnx`, 1.4 MB) |
| Training data | 3,450 chunks from 78 full-mix songs across all 9 colors |
| Loss | CrossEntropyLoss with label smoothing (0.1) + inverse-frequency class weights |
## Color Classes
```
Index Color CHROMATIC_TARGETS (temporal / spatial / ontological)
0 Red Past / Thing / Known
1 Orange Past / Thing / Imagined
2 Yellow Future / Place / Imagined
3 Green Future / Place / Forgotten
4 Blue Present / Person / Forgotten
5 Indigo Uniform / Uniform / Known+Forgotten [0.1, 0.4, 0.4]
6 Violet Present / Person / Known
7 White Uniform across all axes
8 Black Uniform across all axes
Targets are derived at runtime from `app/structures/concepts/chromatic_targets.py`,
which reads directly from the canonical `the_rainbow_table_colors` Pydantic model.
Previous versions had hand-rolled copies that diverged for 7 of 9 colours; this was
corrected in April 2026 (fix-chromatic-targets-canonical-source).
```
## Validation Results
Evaluated on 78 labeled songs from `staged_raw_material` using 30s/5s-stride chunked scoring with confidence-weighted aggregation.
| Color | Correct | Total | Accuracy |
|---|---|---|---|
| Red | 11 | 12 | 91.7% |
| Orange | 4 | 4 | 100.0% |
| Yellow | 10 | 10 | 100.0% |
| Green | 6 | 8 | 75.0% |
| Blue | 11 | 11 | 100.0% |
| Indigo | 10 | 11 | 90.9% |
| Violet | 11 | 12 | 91.7% |
| White | 9 | 10 | 90.0% |
| **Overall** | **72** | **78** | **92.3%** |
## Usage
The CDM is used via the `Refractor` wrapper. It auto-loads when `refractor_cdm.onnx` is present alongside `refractor.onnx`.
```python
from training.refractor import Refractor
scorer = Refractor() # CDM auto-detected
result = scorer.score(
audio_emb=scorer.prepare_audio(waveform, sr=48000),
concept_emb=scorer.prepare_concept("A song about forgetting the future"),
)
# result: {"temporal": {...}, "spatial": {...}, "ontological": {...}, "confidence": 0.93}
```
For full-mix WAV files, use `chunk_audio` + `aggregate_chunk_scores` from `score_mix.py` to score in overlapping windows and pool results.
## Training
```bash
# Phase 1 — extract CLAP + concept embeddings from staged_raw_material/
python training/extract_cdm_embeddings.py
# Phase 2 — train on Modal (A10G GPU)
modal run training/modal_train_refractor_cdm.py
# Validate
python training/validate_mix_scoring.py
```
## Limitations
- CLAP embeddings have a maximum internal window of ~10s; chunked scoring is essential for full-length tracks
- Green classification is the weakest at 75% — two songs are near the Yellow/Violet boundary
- Training data is drawn from a single artist's catalog — generalization to other music is untested
- The concept embedding path requires a DeBERTa-v3-base inference pass (~600 MB model)
## Citation
Part of *The Rainbow Table* generative music pipeline.
See [brotherclone/white](https://github.com/brotherclone/white) and [earthlyframes/white-training-data](https://huggingface.co/datasets/earthlyframes/white-training-data).
|