clap-models / README.md
AEmotionStudio's picture
Upload README.md
fa60a19 verified
|
Raw
History Blame Contribute Delete
1.66 kB
# LAION-CLAP — AEmotionStudio mirror
Mirror of [LAION-CLAP](https://github.com/LAION-AI/CLAP) audio-text
joint-embedding model weights, used by:
- Tessera's Find-Similar grain overlay (corpus map → click → top-K)
- The standalone **CLAP** panel: Text Search · Similar Clips · Auto-tag
Upstream: https://huggingface.co/lukewys/laion_clap
License: CC0-1.0.
## Format
We ship `.safetensors` only (no pickle, no PyTorch 2.6+
`weights_only=True` gotchas, ~3× smaller than the upstream `.pt`
because training metadata is dropped). Each file contains the
bare audio-encoder + text-encoder `state_dict`. Use
`safetensors.torch.load_file(path)` and
`module.model.load_state_dict(sd, strict=False)` — the legacy
`load_ckpt(ckpt=...)` API still works against the upstream `.pt`
files but not against these.
## Files
- `630k-audioset-best.safetensors` (variant `general`, `amodel=HTSAT-tiny`) — non-fusion HTSAT-tiny checkpoint trained on 630k clips + AudioSet (best validation); `amodel='HTSAT-tiny'` in `laion_clap.CLAP_Module(...)`.
- `music_audioset_epoch_15_esc_90.14.safetensors` (variant `music`, `amodel=HTSAT-base`) — music-specialized LAION-CLAP fine-tune; 90.14% on ESC-50; better on music corpora at the cost of marginal regression on speech/SFX. `amodel='HTSAT-base'` (NOT tiny — the music variant trains a bigger backbone).
## Loading
```python
import laion_clap
from safetensors.torch import load_file
m = laion_clap.CLAP_Module(enable_fusion=False, amodel='HTSAT-tiny')
sd = load_file('630k-audioset-best.safetensors')
m.model.load_state_dict(sd, strict=False)
emb = m.get_audio_embedding_from_data(audio_array_list)
```