AFClap / README.md
Atotti's picture
Update README.md
2f501ac verified
---
base_model:
- nvidia/audio-flamingo-2
pipeline_tag: feature-extraction
---
# AFClap - Audio Flamingo CLAP
Audio-Flamingo-2のCLAPモデル(epoch_16.pt)。
## Model Info
- **Base**: HTSAT + T5
- **Audio Embed Dim**: 2048
- **Sample Rate**: 16000 Hz
- **Original**: [nvidia/audio-flamingo-2](https://huggingface.co/nvidia/audio-flamingo-2)
## Installation
```bash
pip install laion-clap librosa soundfile
```
## Usage
```python
import torch
from laion_clap import CLAP_Module
def load_afclap(ckpt_path):
model = CLAP_Module(
enable_fusion=True,
amodel='HTSAT-afclap',
tmodel='t5'
).cuda()
model.load_afclap_ckpt(ckpt=ckpt_path, verbose=True)
return model
# Load model
model = load_afclap("epoch_16.pt")
# Get audio embedding
audio_embed = model.get_audio_embedding_from_filelist(
["audio1.wav", "audio2.wav"],
sr=16000,
use_tensor=True
)
# Get text embedding
text_embed = model.get_text_embedding(
["This is a classical song.", "This is a rock song."],
use_tensor=True
)
# Compute similarity
similarities = torch.tensor(audio_embed) @ torch.tensor(text_embed).t()
print(similarities)
```