--- license: bsd-3-clause tags: - audio - audio-classification - sample-tagging - clap - htsat - onnx library_name: onnxruntime --- # MAGDA Sample Tagger ONNX exports of [LAION's CLAP HTSAT-unfused model](https://huggingface.co/laion/clap-htsat-unfused) plus the RoBERTa tokenizer, packaged for the [MAGDA DAW](https://github.com/Conceptual-Machines/magda-core)'s sample library (issue #768). ## What's in this repo | File | Size | SHA-256 | |------|------|---------| | `clap_audio.onnx` | 111.8 MB | `3f42f71e555b62709910b6efa66fa5879f00d9571874b12b0fa674f82dbfe332` | | `clap_text.onnx` | 478.2 MB | `c07b27204836877d5b615c103685b66ea8f21bc6b5b70a572be356125423a8bf` | | `tokenizer.json` | 3.4 MB | `4fd1d86b4f5b53f40a609fcd11c1f34024b735f870a07439d70202b98493661a` | - `clap_audio.onnx` — audio encoder. Takes a mono 48 kHz waveform, produces a 512-d normalised embedding suitable for cosine similarity search. - `clap_text.onnx` — text encoder. Takes RoBERTa token ids + attention mask, produces a 512-d normalised embedding in the same space as the audio encoder so a text query can rank audio files by similarity. - `tokenizer.json` — the RoBERTa BPE tokenizer that pairs with the text encoder. MAGDA's C++ tokenizer reads this file directly. ## How MAGDA uses these MAGDA's media database (a SQLite catalogue of audio samples) uses these encoders to: - Compute an embedding per indexed sample at index time, stored in the `media_embedding` table. - Encode the user's free-text search query at query time and rank samples by cosine similarity to the query embedding. Without these models MAGDA falls back to filename / tag full-text search — still useful, just no semantic similarity. ## Export procedure ONNX exports are generated from `laion/clap-htsat-unfused` via the export script in MAGDA's prototype: ``` prototypes/media_db/src/media_db/embeddings/onnx_export.py ``` Notes: - Run on CPU (MPS does not support float64 used by the audio encoder's mel filterbank). - Requires `transformers >= 5.x`. The audio-feature accessor was renamed from `audios=` to `audio=` between 4.x and 5.x; passing the old kwarg silently returns wrong shapes. - `tokenizer.json` is the unmodified file from the upstream HF repo, fetched via `AutoTokenizer.from_pretrained(...).save_pretrained(...)`. ## License BSD-3-Clause — same as the upstream LAION CLAP weights. See the upstream repo for the original notice and attribution.