Upload README.md
Browse files
README.md
CHANGED
|
@@ -8,16 +8,29 @@ joint-embedding model weights, used by:
|
|
| 8 |
Upstream: https://huggingface.co/lukewys/laion_clap
|
| 9 |
License: CC0-1.0.
|
| 10 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
## Files
|
| 12 |
|
| 13 |
-
- `630k-audioset-best.
|
| 14 |
-
- `music_audioset_epoch_15_esc_90.14.
|
| 15 |
|
| 16 |
## Loading
|
| 17 |
|
| 18 |
```python
|
| 19 |
import laion_clap
|
| 20 |
-
|
| 21 |
-
m.
|
|
|
|
|
|
|
| 22 |
emb = m.get_audio_embedding_from_data(audio_array_list)
|
| 23 |
```
|
|
|
|
| 8 |
Upstream: https://huggingface.co/lukewys/laion_clap
|
| 9 |
License: CC0-1.0.
|
| 10 |
|
| 11 |
+
## Format
|
| 12 |
+
|
| 13 |
+
We ship `.safetensors` only (no pickle, no PyTorch 2.6+
|
| 14 |
+
`weights_only=True` gotchas, ~3× smaller than the upstream `.pt`
|
| 15 |
+
because training metadata is dropped). Each file contains the
|
| 16 |
+
bare audio-encoder + text-encoder `state_dict`. Use
|
| 17 |
+
`safetensors.torch.load_file(path)` and
|
| 18 |
+
`module.model.load_state_dict(sd, strict=False)` — the legacy
|
| 19 |
+
`load_ckpt(ckpt=...)` API still works against the upstream `.pt`
|
| 20 |
+
files but not against these.
|
| 21 |
+
|
| 22 |
## Files
|
| 23 |
|
| 24 |
+
- `630k-audioset-best.safetensors` (variant `general`, `amodel=HTSAT-tiny`) — non-fusion HTSAT-tiny checkpoint trained on 630k clips + AudioSet (best validation); `amodel='HTSAT-tiny'` in `laion_clap.CLAP_Module(...)`.
|
| 25 |
+
- `music_audioset_epoch_15_esc_90.14.safetensors` (variant `music`, `amodel=HTSAT-base`) — music-specialized LAION-CLAP fine-tune; 90.14% on ESC-50; better on music corpora at the cost of marginal regression on speech/SFX. `amodel='HTSAT-base'` (NOT tiny — the music variant trains a bigger backbone).
|
| 26 |
|
| 27 |
## Loading
|
| 28 |
|
| 29 |
```python
|
| 30 |
import laion_clap
|
| 31 |
+
from safetensors.torch import load_file
|
| 32 |
+
m = laion_clap.CLAP_Module(enable_fusion=False, amodel='HTSAT-tiny')
|
| 33 |
+
sd = load_file('630k-audioset-best.safetensors')
|
| 34 |
+
m.model.load_state_dict(sd, strict=False)
|
| 35 |
emb = m.get_audio_embedding_from_data(audio_array_list)
|
| 36 |
```
|