davidrrobinson
/

BioLingual

Feature Extraction

Model card Files Files and versions

davidrrobinson commited on Jul 24, 2023

Commit

99d8433

·

1 Parent(s): f048007

Create README.md

Files changed (1) hide show

README.md +68 -0

README.md ADDED Viewed

	@@ -0,0 +1,68 @@

+---
+datasets:
+- davidrrobinson/AnimalSpeak
+---
+# Model card for BioLingual
+Model card for BioLingual: Transferable Models for bioacoustics with Human Language Supervision
+An audio-text model for bioacoustics based on contrastive language-audio pretraining.
+# Usage
+You can use this model for bioacoustic zero shot audio classification, or for fine-tuning on bioacoustic tasks.
+# Uses
+## Perform zero-shot audio classification
+### Using `pipeline`
+```python
+from datasets import load_dataset
+from transformers import pipeline
+dataset = load_dataset("ashraq/esc50")
+audio = dataset["train"]["audio"][-1]["array"]
+audio_classifier = pipeline(task="zero-shot-audio-classification", model="davidrrobinson/BioLingual")
+output = audio_classifier(audio, candidate_labels=["Sound of a sperm whale", "Sound of a sea lion"])
+print(output)
+>>> [{"score": 0.999, "label": "Sound of a dog"}, {"score": 0.001, "label": "Sound of vaccum cleaner"}]
+```
+## Run the model:
+You can also get the audio and text embeddings using `ClapModel`
+### Run the model on CPU:
+```python
+from datasets import load_dataset
+from transformers import ClapModel, ClapProcessor
+librispeech_dummy = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
+audio_sample = librispeech_dummy[0]
+model = ClapModel.from_pretrained("laion/clap-htsat-unfused")
+processor = ClapProcessor.from_pretrained("laion/clap-htsat-unfused")
+inputs = processor(audios=audio_sample["audio"]["array"], return_tensors="pt")
+audio_embed = model.get_audio_features(**inputs)
+```
+### Run the model on GPU:
+```python
+from datasets import load_dataset
+from transformers import ClapModel, ClapProcessor
+librispeech_dummy = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
+audio_sample = librispeech_dummy[0]
+model = ClapModel.from_pretrained("laion/clap-htsat-unfused").to(0)
+processor = ClapProcessor.from_pretrained("laion/clap-htsat-unfused")
+inputs = processor(audios=audio_sample["audio"]["array"], return_tensors="pt").to(0)
+audio_embed = model.get_audio_features(**inputs)