| # LAION-CLAP — AEmotionStudio mirror |
|
|
| Mirror of [LAION-CLAP](https://github.com/LAION-AI/CLAP) audio-text |
| joint-embedding model weights, used by: |
| - Tessera's Find-Similar grain overlay (corpus map → click → top-K) |
| - The standalone **CLAP** panel: Text Search · Similar Clips · Auto-tag |
|
|
| Upstream: https://huggingface.co/lukewys/laion_clap |
| License: CC0-1.0. |
| |
| ## Format |
| |
| We ship `.safetensors` only (no pickle, no PyTorch 2.6+ |
| `weights_only=True` gotchas, ~3× smaller than the upstream `.pt` |
| because training metadata is dropped). Each file contains the |
| bare audio-encoder + text-encoder `state_dict`. Use |
| `safetensors.torch.load_file(path)` and |
| `module.model.load_state_dict(sd, strict=False)` — the legacy |
| `load_ckpt(ckpt=...)` API still works against the upstream `.pt` |
| files but not against these. |
|
|
| ## Files |
|
|
| - `630k-audioset-best.safetensors` (variant `general`, `amodel=HTSAT-tiny`) — non-fusion HTSAT-tiny checkpoint trained on 630k clips + AudioSet (best validation); `amodel='HTSAT-tiny'` in `laion_clap.CLAP_Module(...)`. |
| - `music_audioset_epoch_15_esc_90.14.safetensors` (variant `music`, `amodel=HTSAT-base`) — music-specialized LAION-CLAP fine-tune; 90.14% on ESC-50; better on music corpora at the cost of marginal regression on speech/SFX. `amodel='HTSAT-base'` (NOT tiny — the music variant trains a bigger backbone). |
|
|
| ## Loading |
|
|
| ```python |
| import laion_clap |
| from safetensors.torch import load_file |
| m = laion_clap.CLAP_Module(enable_fusion=False, amodel='HTSAT-tiny') |
| sd = load_file('630k-audioset-best.safetensors') |
| m.model.load_state_dict(sd, strict=False) |
| emb = m.get_audio_embedding_from_data(audio_array_list) |
| ``` |
|
|