--- base_model: - nvidia/audio-flamingo-2 pipeline_tag: feature-extraction --- # AFClap - Audio Flamingo CLAP Audio-Flamingo-2のCLAPモデル(epoch_16.pt)。 ## Model Info - **Base**: HTSAT + T5 - **Audio Embed Dim**: 2048 - **Sample Rate**: 16000 Hz - **Original**: [nvidia/audio-flamingo-2](https://huggingface.co/nvidia/audio-flamingo-2) ## Installation ```bash pip install laion-clap librosa soundfile ``` ## Usage ```python import torch from laion_clap import CLAP_Module def load_afclap(ckpt_path): model = CLAP_Module( enable_fusion=True, amodel='HTSAT-afclap', tmodel='t5' ).cuda() model.load_afclap_ckpt(ckpt=ckpt_path, verbose=True) return model # Load model model = load_afclap("epoch_16.pt") # Get audio embedding audio_embed = model.get_audio_embedding_from_filelist( ["audio1.wav", "audio2.wav"], sr=16000, use_tensor=True ) # Get text embedding text_embed = model.get_text_embedding( ["This is a classical song.", "This is a rock song."], use_tensor=True ) # Compute similarity similarities = torch.tensor(audio_embed) @ torch.tensor(text_embed).t() print(similarities) ```