ConceptualMachines
/

magda-sample-tagger

Audio Classification

Model card Files Files and versions

ConceptualMachines commited on 18 days ago

Commit

6f03a79

·

verified ·

1 Parent(s): af6bfe1

model card

Files changed (1) hide show

README.md +72 -0

README.md ADDED Viewed

	@@ -0,0 +1,72 @@

+---
+license: bsd-3-clause
+tags:
+- audio
+- audio-classification
+- sample-tagging
+- clap
+- htsat
+- onnx
+library_name: onnxruntime
+---
+# MAGDA Sample Tagger
+ONNX exports of [LAION's CLAP HTSAT-unfused model](https://huggingface.co/laion/clap-htsat-unfused)
+plus the RoBERTa tokenizer, packaged for the
+[MAGDA DAW](https://github.com/Conceptual-Machines/magda-core)'s sample
+library (issue #768).
+## What's in this repo
+| File | Size | SHA-256 |
+|------|------|---------|
+| `clap_audio.onnx` | 111.8 MB | `3f42f71e555b62709910b6efa66fa5879f00d9571874b12b0fa674f82dbfe332` |
+| `clap_text.onnx` | 478.2 MB | `c07b27204836877d5b615c103685b66ea8f21bc6b5b70a572be356125423a8bf` |
+| `tokenizer.json` | 3.4 MB | `4fd1d86b4f5b53f40a609fcd11c1f34024b735f870a07439d70202b98493661a` |
+- `clap_audio.onnx` — audio encoder. Takes a mono 48 kHz waveform,
+  produces a 512-d normalised embedding suitable for cosine similarity
+  search.
+- `clap_text.onnx` — text encoder. Takes RoBERTa token ids + attention
+  mask, produces a 512-d normalised embedding in the same space as the
+  audio encoder so a text query can rank audio files by similarity.
+- `tokenizer.json` — the RoBERTa BPE tokenizer that pairs with the
+  text encoder. MAGDA's C++ tokenizer reads this file directly.
+## How MAGDA uses these
+MAGDA's media database (a SQLite catalogue of audio samples) uses
+these encoders to:
+- Compute an embedding per indexed sample at index time, stored in the
+  `media_embedding` table.
+- Encode the user's free-text search query at query time and rank
+  samples by cosine similarity to the query embedding.
+Without these models MAGDA falls back to filename / tag full-text
+search — still useful, just no semantic similarity.
+## Export procedure
+ONNX exports are generated from `laion/clap-htsat-unfused` via the
+export script in MAGDA's prototype:
+```
+prototypes/media_db/src/media_db/embeddings/onnx_export.py
+```
+Notes:
+- Run on CPU (MPS does not support float64 used by the audio encoder's
+  mel filterbank).
+- Requires `transformers >= 5.x`. The audio-feature accessor was renamed
+  from `audios=` to `audio=` between 4.x and 5.x; passing the old kwarg
+  silently returns wrong shapes.
+- `tokenizer.json` is the unmodified file from the upstream HF repo,
+  fetched via `AutoTokenizer.from_pretrained(...).save_pretrained(...)`.
+## License
+BSD-3-Clause — same as the upstream LAION CLAP weights. See the
+upstream repo for the original notice and attribution.