Atotti
/

AFClap

Feature Extraction

Model card Files Files and versions

xet

Community

Atotti commited on Dec 6, 2025

Commit

ac729c2

verified ·

1 Parent(s): fe79ae9

Upload README.md with huggingface_hub

Browse files

Files changed (1) hide show

README.md +67 -0

README.md ADDED Viewed

	@@ -0,0 +1,67 @@

+# AFClap - Audio Flamingo CLAP
+Audio-Flamingo-2のCLAPモデル（epoch_16.pt）。テキストと音声の埋め込みを取得し、類似度計算が可能。
+## Model Info
+- **Base**: HTSAT + T5
+- **Audio Embed Dim**: 2048
+- **Sample Rate**: 16000 Hz
+- **Original**: [nvidia/audio-flamingo-2](https://huggingface.co/nvidia/audio-flamingo-2)
+## Installation
+```bash
+pip install laion-clap librosa soundfile
+```
+## Usage
+```python
+import torch
+from laion_clap import CLAP_Module
+def load_afclap(ckpt_path):
+    model = CLAP_Module(
+        enable_fusion=True,
+        amodel='HTSAT-afclap',
+        tmodel='t5'
+    ).cuda()
+    model.load_afclap_ckpt(ckpt=ckpt_path, verbose=True)
+    return model
+# Load model
+model = load_afclap("epoch_16.pt")
+# Get audio embedding
+audio_embed = model.get_audio_embedding_from_filelist(
+    ["audio1.wav", "audio2.wav"],
+    sr=16000,
+    use_tensor=True
+)
+# Get text embedding
+text_embed = model.get_text_embedding(
+    ["This is a classical song.", "This is a rock song."],
+    use_tensor=True
+)
+# Compute similarity
+similarities = torch.tensor(audio_embed) @ torch.tensor(text_embed).t()
+print(similarities)
+```
+## License
+MIT License (NVIDIA Corporation)
+## Citation
+```bibtex
+@article{kong2024audio,
+  title={Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities},
+  author={Kong, Zhifeng and Goel, Arushi and Badlani, Rohan and Wang, Wei and Valle, Rafael and Catanzaro, Bryan},
+  journal={arXiv preprint arXiv:2402.01831},
+  year={2024}
+}
+```