ConceptualMachines commited on
Commit
6f03a79
·
verified ·
1 Parent(s): af6bfe1

model card

Browse files
Files changed (1) hide show
  1. README.md +72 -0
README.md ADDED
@@ -0,0 +1,72 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: bsd-3-clause
3
+ tags:
4
+ - audio
5
+ - audio-classification
6
+ - sample-tagging
7
+ - clap
8
+ - htsat
9
+ - onnx
10
+ library_name: onnxruntime
11
+ ---
12
+
13
+ # MAGDA Sample Tagger
14
+
15
+ ONNX exports of [LAION's CLAP HTSAT-unfused model](https://huggingface.co/laion/clap-htsat-unfused)
16
+ plus the RoBERTa tokenizer, packaged for the
17
+ [MAGDA DAW](https://github.com/Conceptual-Machines/magda-core)'s sample
18
+ library (issue #768).
19
+
20
+ ## What's in this repo
21
+
22
+ | File | Size | SHA-256 |
23
+ |------|------|---------|
24
+ | `clap_audio.onnx` | 111.8 MB | `3f42f71e555b62709910b6efa66fa5879f00d9571874b12b0fa674f82dbfe332` |
25
+ | `clap_text.onnx` | 478.2 MB | `c07b27204836877d5b615c103685b66ea8f21bc6b5b70a572be356125423a8bf` |
26
+ | `tokenizer.json` | 3.4 MB | `4fd1d86b4f5b53f40a609fcd11c1f34024b735f870a07439d70202b98493661a` |
27
+
28
+ - `clap_audio.onnx` — audio encoder. Takes a mono 48 kHz waveform,
29
+ produces a 512-d normalised embedding suitable for cosine similarity
30
+ search.
31
+ - `clap_text.onnx` — text encoder. Takes RoBERTa token ids + attention
32
+ mask, produces a 512-d normalised embedding in the same space as the
33
+ audio encoder so a text query can rank audio files by similarity.
34
+ - `tokenizer.json` — the RoBERTa BPE tokenizer that pairs with the
35
+ text encoder. MAGDA's C++ tokenizer reads this file directly.
36
+
37
+ ## How MAGDA uses these
38
+
39
+ MAGDA's media database (a SQLite catalogue of audio samples) uses
40
+ these encoders to:
41
+
42
+ - Compute an embedding per indexed sample at index time, stored in the
43
+ `media_embedding` table.
44
+ - Encode the user's free-text search query at query time and rank
45
+ samples by cosine similarity to the query embedding.
46
+
47
+ Without these models MAGDA falls back to filename / tag full-text
48
+ search — still useful, just no semantic similarity.
49
+
50
+ ## Export procedure
51
+
52
+ ONNX exports are generated from `laion/clap-htsat-unfused` via the
53
+ export script in MAGDA's prototype:
54
+
55
+ ```
56
+ prototypes/media_db/src/media_db/embeddings/onnx_export.py
57
+ ```
58
+
59
+ Notes:
60
+
61
+ - Run on CPU (MPS does not support float64 used by the audio encoder's
62
+ mel filterbank).
63
+ - Requires `transformers >= 5.x`. The audio-feature accessor was renamed
64
+ from `audios=` to `audio=` between 4.x and 5.x; passing the old kwarg
65
+ silently returns wrong shapes.
66
+ - `tokenizer.json` is the unmodified file from the upstream HF repo,
67
+ fetched via `AutoTokenizer.from_pretrained(...).save_pretrained(...)`.
68
+
69
+ ## License
70
+
71
+ BSD-3-Clause — same as the upstream LAION CLAP weights. See the
72
+ upstream repo for the original notice and attribution.