File size: 2,468 Bytes
6f03a79 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 | ---
license: bsd-3-clause
tags:
- audio
- audio-classification
- sample-tagging
- clap
- htsat
- onnx
library_name: onnxruntime
---
# MAGDA Sample Tagger
ONNX exports of [LAION's CLAP HTSAT-unfused model](https://huggingface.co/laion/clap-htsat-unfused)
plus the RoBERTa tokenizer, packaged for the
[MAGDA DAW](https://github.com/Conceptual-Machines/magda-core)'s sample
library (issue #768).
## What's in this repo
| File | Size | SHA-256 |
|------|------|---------|
| `clap_audio.onnx` | 111.8 MB | `3f42f71e555b62709910b6efa66fa5879f00d9571874b12b0fa674f82dbfe332` |
| `clap_text.onnx` | 478.2 MB | `c07b27204836877d5b615c103685b66ea8f21bc6b5b70a572be356125423a8bf` |
| `tokenizer.json` | 3.4 MB | `4fd1d86b4f5b53f40a609fcd11c1f34024b735f870a07439d70202b98493661a` |
- `clap_audio.onnx` — audio encoder. Takes a mono 48 kHz waveform,
produces a 512-d normalised embedding suitable for cosine similarity
search.
- `clap_text.onnx` — text encoder. Takes RoBERTa token ids + attention
mask, produces a 512-d normalised embedding in the same space as the
audio encoder so a text query can rank audio files by similarity.
- `tokenizer.json` — the RoBERTa BPE tokenizer that pairs with the
text encoder. MAGDA's C++ tokenizer reads this file directly.
## How MAGDA uses these
MAGDA's media database (a SQLite catalogue of audio samples) uses
these encoders to:
- Compute an embedding per indexed sample at index time, stored in the
`media_embedding` table.
- Encode the user's free-text search query at query time and rank
samples by cosine similarity to the query embedding.
Without these models MAGDA falls back to filename / tag full-text
search — still useful, just no semantic similarity.
## Export procedure
ONNX exports are generated from `laion/clap-htsat-unfused` via the
export script in MAGDA's prototype:
```
prototypes/media_db/src/media_db/embeddings/onnx_export.py
```
Notes:
- Run on CPU (MPS does not support float64 used by the audio encoder's
mel filterbank).
- Requires `transformers >= 5.x`. The audio-feature accessor was renamed
from `audios=` to `audio=` between 4.x and 5.x; passing the old kwarg
silently returns wrong shapes.
- `tokenizer.json` is the unmodified file from the upstream HF repo,
fetched via `AutoTokenizer.from_pretrained(...).save_pretrained(...)`.
## License
BSD-3-Clause — same as the upstream LAION CLAP weights. See the
upstream repo for the original notice and attribution.
|