MAGDA Sample Tagger

ONNX exports of LAION's CLAP HTSAT-unfused model plus the RoBERTa tokenizer, packaged for the MAGDA DAW's sample library (issue #768).

What's in this repo

File Size SHA-256
clap_audio.onnx 111.8 MB 3f42f71e555b62709910b6efa66fa5879f00d9571874b12b0fa674f82dbfe332
clap_text.onnx 478.2 MB c07b27204836877d5b615c103685b66ea8f21bc6b5b70a572be356125423a8bf
tokenizer.json 3.4 MB 4fd1d86b4f5b53f40a609fcd11c1f34024b735f870a07439d70202b98493661a
  • clap_audio.onnx โ€” audio encoder. Takes a mono 48 kHz waveform, produces a 512-d normalised embedding suitable for cosine similarity search.
  • clap_text.onnx โ€” text encoder. Takes RoBERTa token ids + attention mask, produces a 512-d normalised embedding in the same space as the audio encoder so a text query can rank audio files by similarity.
  • tokenizer.json โ€” the RoBERTa BPE tokenizer that pairs with the text encoder. MAGDA's C++ tokenizer reads this file directly.

How MAGDA uses these

MAGDA's media database (a SQLite catalogue of audio samples) uses these encoders to:

  • Compute an embedding per indexed sample at index time, stored in the media_embedding table.
  • Encode the user's free-text search query at query time and rank samples by cosine similarity to the query embedding.

Without these models MAGDA falls back to filename / tag full-text search โ€” still useful, just no semantic similarity.

Export procedure

ONNX exports are generated from laion/clap-htsat-unfused via the export script in MAGDA's prototype:

prototypes/media_db/src/media_db/embeddings/onnx_export.py

Notes:

  • Run on CPU (MPS does not support float64 used by the audio encoder's mel filterbank).
  • Requires transformers >= 5.x. The audio-feature accessor was renamed from audios= to audio= between 4.x and 5.x; passing the old kwarg silently returns wrong shapes.
  • tokenizer.json is the unmodified file from the upstream HF repo, fetched via AutoTokenizer.from_pretrained(...).save_pretrained(...).

License

BSD-3-Clause โ€” same as the upstream LAION CLAP weights. See the upstream repo for the original notice and attribution.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support