AEmotionStudio
/

clap-models

Model card Files Files and versions

clap-models / README.md

AEmotionStudio's picture

Upload README.md

fa60a19 verified about 1 month ago

|

History Blame Contribute Delete

1.66 kB

	# LAION-CLAP — AEmotionStudio mirror

	Mirror of [LAION-CLAP](https://github.com/LAION-AI/CLAP) audio-text
	joint-embedding model weights, used by:
	- Tessera's Find-Similar grain overlay (corpus map → click → top-K)
	- The standalone CLAP panel: Text Search · Similar Clips · Auto-tag

	Upstream: https://huggingface.co/lukewys/laion_clap
	License: CC0-1.0.

	## Format

	We ship `.safetensors` only (no pickle, no PyTorch 2.6+
	`weights_only=True` gotchas, ~3× smaller than the upstream `.pt`
	because training metadata is dropped). Each file contains the
	bare audio-encoder + text-encoder `state_dict`. Use
	`safetensors.torch.load_file(path)` and
	`module.model.load_state_dict(sd, strict=False)` — the legacy
	`load_ckpt(ckpt=...)` API still works against the upstream `.pt`
	files but not against these.

	## Files

	- `630k-audioset-best.safetensors` (variant `general`, `amodel=HTSAT-tiny`) — non-fusion HTSAT-tiny checkpoint trained on 630k clips + AudioSet (best validation); `amodel='HTSAT-tiny'` in `laion_clap.CLAP_Module(...)`.
	- `music_audioset_epoch_15_esc_90.14.safetensors` (variant `music`, `amodel=HTSAT-base`) — music-specialized LAION-CLAP fine-tune; 90.14% on ESC-50; better on music corpora at the cost of marginal regression on speech/SFX. `amodel='HTSAT-base'` (NOT tiny — the music variant trains a bigger backbone).

	## Loading

	```python
	import laion_clap
	from safetensors.torch import load_file
	m = laion_clap.CLAP_Module(enable_fusion=False, amodel='HTSAT-tiny')
	sd = load_file('630k-audioset-best.safetensors')
	m.model.load_state_dict(sd, strict=False)
	emb = m.get_audio_embedding_from_data(audio_array_list)
	```