AEmotionStudio commited on
Commit
fa60a19
·
verified ·
1 Parent(s): fc01d03

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +17 -4
README.md CHANGED
@@ -8,16 +8,29 @@ joint-embedding model weights, used by:
8
  Upstream: https://huggingface.co/lukewys/laion_clap
9
  License: CC0-1.0.
10
 
 
 
 
 
 
 
 
 
 
 
 
11
  ## Files
12
 
13
- - `630k-audioset-best.pt` (variant `general`) — non-fusion HTSAT-base checkpoint trained on 630k clips + AudioSet (best validation; `model_id=1` in `laion_clap.CLAP_Module.load_ckpt`).
14
- - `music_audioset_epoch_15_esc_90.14.pt` (variant `music`) — music-specialized LAION-CLAP fine-tune; 90.14% on ESC-50; better on music corpora at the cost of marginal regression on speech/SFX. Same HTSAT-base + non-fusion architecture as the general checkpoint drop-in replacement.
15
 
16
  ## Loading
17
 
18
  ```python
19
  import laion_clap
20
- m = laion_clap.CLAP_Module(enable_fusion=False, amodel='HTSAT-base')
21
- m.load_ckpt(ckpt='630k-audioset-best.pt', verbose=False)
 
 
22
  emb = m.get_audio_embedding_from_data(audio_array_list)
23
  ```
 
8
  Upstream: https://huggingface.co/lukewys/laion_clap
9
  License: CC0-1.0.
10
 
11
+ ## Format
12
+
13
+ We ship `.safetensors` only (no pickle, no PyTorch 2.6+
14
+ `weights_only=True` gotchas, ~3× smaller than the upstream `.pt`
15
+ because training metadata is dropped). Each file contains the
16
+ bare audio-encoder + text-encoder `state_dict`. Use
17
+ `safetensors.torch.load_file(path)` and
18
+ `module.model.load_state_dict(sd, strict=False)` — the legacy
19
+ `load_ckpt(ckpt=...)` API still works against the upstream `.pt`
20
+ files but not against these.
21
+
22
  ## Files
23
 
24
+ - `630k-audioset-best.safetensors` (variant `general`, `amodel=HTSAT-tiny`) — non-fusion HTSAT-tiny checkpoint trained on 630k clips + AudioSet (best validation); `amodel='HTSAT-tiny'` in `laion_clap.CLAP_Module(...)`.
25
+ - `music_audioset_epoch_15_esc_90.14.safetensors` (variant `music`, `amodel=HTSAT-base`) — music-specialized LAION-CLAP fine-tune; 90.14% on ESC-50; better on music corpora at the cost of marginal regression on speech/SFX. `amodel='HTSAT-base'` (NOT tiny the music variant trains a bigger backbone).
26
 
27
  ## Loading
28
 
29
  ```python
30
  import laion_clap
31
+ from safetensors.torch import load_file
32
+ m = laion_clap.CLAP_Module(enable_fusion=False, amodel='HTSAT-tiny')
33
+ sd = load_file('630k-audioset-best.safetensors')
34
+ m.model.load_state_dict(sd, strict=False)
35
  emb = m.get_audio_embedding_from_data(audio_array_list)
36
  ```