File size: 1,660 Bytes

2f8ee6a
 
0478f90
 
 
 
2f8ee6a
 
 
 
fa60a19
 
 
 
 
 
 
 
 
 
 
2f8ee6a
 
fa60a19
 
2f8ee6a
 
 
 
 
fa60a19
 
 
 
0478f90
2f8ee6a

# LAION-CLAP — AEmotionStudio mirror

Mirror of [LAION-CLAP](https://github.com/LAION-AI/CLAP) audio-text 
joint-embedding model weights, used by:
- Tessera's Find-Similar grain overlay (corpus map → click → top-K)
- The standalone **CLAP** panel: Text Search · Similar Clips · Auto-tag

Upstream: https://huggingface.co/lukewys/laion_clap  
License: CC0-1.0.

## Format

We ship `.safetensors` only (no pickle, no PyTorch 2.6+ 
`weights_only=True` gotchas, ~3× smaller than the upstream `.pt` 
because training metadata is dropped). Each file contains the 
bare audio-encoder + text-encoder `state_dict`. Use 
`safetensors.torch.load_file(path)` and 
`module.model.load_state_dict(sd, strict=False)` — the legacy 
`load_ckpt(ckpt=...)` API still works against the upstream `.pt` 
files but not against these.

## Files

- `630k-audioset-best.safetensors` (variant `general`, `amodel=HTSAT-tiny`) — non-fusion HTSAT-tiny checkpoint trained on 630k clips + AudioSet (best validation); `amodel='HTSAT-tiny'` in `laion_clap.CLAP_Module(...)`.
- `music_audioset_epoch_15_esc_90.14.safetensors` (variant `music`, `amodel=HTSAT-base`) — music-specialized LAION-CLAP fine-tune; 90.14% on ESC-50; better on music corpora at the cost of marginal regression on speech/SFX. `amodel='HTSAT-base'` (NOT tiny — the music variant trains a bigger backbone).

## Loading

```python
import laion_clap
from safetensors.torch import load_file
m = laion_clap.CLAP_Module(enable_fusion=False, amodel='HTSAT-tiny')
sd = load_file('630k-audioset-best.safetensors')
m.model.load_state_dict(sd, strict=False)
emb = m.get_audio_embedding_from_data(audio_array_list)
```