Atotti commited on
Commit
ac729c2
·
verified ·
1 Parent(s): fe79ae9

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +67 -0
README.md ADDED
@@ -0,0 +1,67 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # AFClap - Audio Flamingo CLAP
2
+
3
+ Audio-Flamingo-2のCLAPモデル(epoch_16.pt)。テキストと音声の埋め込みを取得し、類似度計算が可能。
4
+
5
+ ## Model Info
6
+
7
+ - **Base**: HTSAT + T5
8
+ - **Audio Embed Dim**: 2048
9
+ - **Sample Rate**: 16000 Hz
10
+ - **Original**: [nvidia/audio-flamingo-2](https://huggingface.co/nvidia/audio-flamingo-2)
11
+
12
+ ## Installation
13
+
14
+ ```bash
15
+ pip install laion-clap librosa soundfile
16
+ ```
17
+
18
+ ## Usage
19
+
20
+ ```python
21
+ import torch
22
+ from laion_clap import CLAP_Module
23
+
24
+ def load_afclap(ckpt_path):
25
+ model = CLAP_Module(
26
+ enable_fusion=True,
27
+ amodel='HTSAT-afclap',
28
+ tmodel='t5'
29
+ ).cuda()
30
+ model.load_afclap_ckpt(ckpt=ckpt_path, verbose=True)
31
+ return model
32
+
33
+ # Load model
34
+ model = load_afclap("epoch_16.pt")
35
+
36
+ # Get audio embedding
37
+ audio_embed = model.get_audio_embedding_from_filelist(
38
+ ["audio1.wav", "audio2.wav"],
39
+ sr=16000,
40
+ use_tensor=True
41
+ )
42
+
43
+ # Get text embedding
44
+ text_embed = model.get_text_embedding(
45
+ ["This is a classical song.", "This is a rock song."],
46
+ use_tensor=True
47
+ )
48
+
49
+ # Compute similarity
50
+ similarities = torch.tensor(audio_embed) @ torch.tensor(text_embed).t()
51
+ print(similarities)
52
+ ```
53
+
54
+ ## License
55
+
56
+ MIT License (NVIDIA Corporation)
57
+
58
+ ## Citation
59
+
60
+ ```bibtex
61
+ @article{kong2024audio,
62
+ title={Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities},
63
+ author={Kong, Zhifeng and Goel, Arushi and Badlani, Rohan and Wang, Wei and Valle, Rafael and Catanzaro, Bryan},
64
+ journal={arXiv preprint arXiv:2402.01831},
65
+ year={2024}
66
+ }
67
+ ```