Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -1,11 +1,11 @@
|
|
| 1 |
---
|
| 2 |
license: cc-by-nc-sa-4.0
|
| 3 |
tags:
|
| 4 |
-
- EarthSpeciesProject
|
| 5 |
-
- AVEX
|
| 6 |
-
- Bioacoustics
|
| 7 |
-
- RepresentationLearning
|
| 8 |
-
- EfficientNet
|
| 9 |
---
|
| 10 |
|
| 11 |
# Model Card for esp-aves2-effnetb0-bio
|
|
@@ -25,9 +25,9 @@ esp-aves2-effnetb0-bio is a **supervised bioacoustic encoder** trained to produc
|
|
| 25 |
|
| 26 |
### Model Sources
|
| 27 |
|
| 28 |
-
- **Repository:** `https://github.com/earthspecies/
|
| 29 |
- **Paper:** [What Matters for Bioacoustic Encoding](https://arxiv.org/abs/2508.11845)
|
| 30 |
-
- **Hugging Face Model:**
|
| 31 |
- **Configuration:** [train_config.yaml](train_config.yaml)
|
| 32 |
|
| 33 |
### Parent Models
|
|
@@ -59,30 +59,75 @@ Not a generative model; does not output text.
|
|
| 59 |
|
| 60 |
## How to Get Started with the Model
|
| 61 |
|
| 62 |
-
Loading this model requires the AVEX (Animal Vocalization Encoder) library `
|
| 63 |
|
| 64 |
### Installation
|
| 65 |
|
| 66 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 67 |
|
| 68 |
### Loading the Model
|
| 69 |
|
| 70 |
```python
|
| 71 |
from avex import load_model
|
| 72 |
|
| 73 |
-
model = load_model("
|
| 74 |
```
|
| 75 |
|
| 76 |
### Using the Model
|
| 77 |
|
| 78 |
```python
|
| 79 |
# Case 1: embedding extraction (features only)
|
| 80 |
-
backbone = load_model("
|
| 81 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 82 |
|
| 83 |
# Case 2: supervised predictions (logits over label IDs; see label_map.json)
|
| 84 |
-
model = load_model("
|
| 85 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 86 |
```
|
| 87 |
|
| 88 |
### Class Label Mapping
|
|
@@ -156,11 +201,11 @@ Aggregate results for linear probing (frozen base model) with esp-aves2-effnetb0
|
|
| 156 |
**BibTeX:**
|
| 157 |
|
| 158 |
```bibtex
|
| 159 |
-
@
|
| 160 |
-
|
| 161 |
-
|
| 162 |
-
|
| 163 |
-
|
| 164 |
}
|
| 165 |
```
|
| 166 |
|
|
|
|
| 1 |
---
|
| 2 |
license: cc-by-nc-sa-4.0
|
| 3 |
tags:
|
| 4 |
+
- EarthSpeciesProject
|
| 5 |
+
- AVEX
|
| 6 |
+
- Bioacoustics
|
| 7 |
+
- RepresentationLearning
|
| 8 |
+
- EfficientNet
|
| 9 |
---
|
| 10 |
|
| 11 |
# Model Card for esp-aves2-effnetb0-bio
|
|
|
|
| 25 |
|
| 26 |
### Model Sources
|
| 27 |
|
| 28 |
+
- **Repository:** `https://github.com/earthspecies/avex`
|
| 29 |
- **Paper:** [What Matters for Bioacoustic Encoding](https://arxiv.org/abs/2508.11845)
|
| 30 |
+
- **Hugging Face Model:** [ESP-AVES2 Collection](https://huggingface.co/collections/EarthSpeciesProject/esp-aves2)
|
| 31 |
- **Configuration:** [train_config.yaml](train_config.yaml)
|
| 32 |
|
| 33 |
### Parent Models
|
|
|
|
| 59 |
|
| 60 |
## How to Get Started with the Model
|
| 61 |
|
| 62 |
+
Loading this model requires the AVEX (Animal Vocalization Encoder) library `avex` to be installed.
|
| 63 |
|
| 64 |
### Installation
|
| 65 |
|
| 66 |
+
```bash
|
| 67 |
+
pip install avex
|
| 68 |
+
```
|
| 69 |
+
|
| 70 |
+
Or with uv:
|
| 71 |
+
|
| 72 |
+
```bash
|
| 73 |
+
uv add avex
|
| 74 |
+
```
|
| 75 |
+
|
| 76 |
+
For more details, see [https://github.com/earthspecies/avex](https://github.com/earthspecies/avex).
|
| 77 |
|
| 78 |
### Loading the Model
|
| 79 |
|
| 80 |
```python
|
| 81 |
from avex import load_model
|
| 82 |
|
| 83 |
+
model = load_model("esp_aves2_effnetb0_bio", device="cuda")
|
| 84 |
```
|
| 85 |
|
| 86 |
### Using the Model
|
| 87 |
|
| 88 |
```python
|
| 89 |
# Case 1: embedding extraction (features only)
|
| 90 |
+
backbone = load_model("esp_aves2_effnetb0_bio", device="cuda", return_features_only=True)
|
| 91 |
+
|
| 92 |
+
with torch.no_grad():
|
| 93 |
+
embeddings = backbone(audio_tensor)
|
| 94 |
+
# Shape: (batch, channels, height, width) for EfficientNet
|
| 95 |
+
|
| 96 |
+
# Pool to get fixed-size embedding
|
| 97 |
+
embedding = embeddings.mean(dim=(2, 3)) # Shape: (batch, channels)
|
| 98 |
|
| 99 |
# Case 2: supervised predictions (logits over label IDs; see label_map.json)
|
| 100 |
+
model = load_model("esp_aves2_effnetb0_bio", device="cuda")
|
| 101 |
+
|
| 102 |
+
with torch.no_grad():
|
| 103 |
+
logits = model(audio_tensor)
|
| 104 |
+
predicted_class = logits.argmax(dim=-1).item()
|
| 105 |
+
```
|
| 106 |
+
|
| 107 |
+
### Transfer Learning with Probes
|
| 108 |
+
|
| 109 |
+
```python
|
| 110 |
+
from avex.models.probes import build_probe_from_config
|
| 111 |
+
from avex.configs import ProbeConfig
|
| 112 |
+
|
| 113 |
+
# Load backbone for feature extraction
|
| 114 |
+
base = load_model("esp_aves2_effnetb0_bio", return_features_only=True, device="cuda")
|
| 115 |
+
|
| 116 |
+
# Define a probe head for your task
|
| 117 |
+
probe_config = ProbeConfig(
|
| 118 |
+
probe_type="linear",
|
| 119 |
+
target_layers=["last_layer"],
|
| 120 |
+
aggregation="mean",
|
| 121 |
+
freeze_backbone=True,
|
| 122 |
+
online_training=True,
|
| 123 |
+
)
|
| 124 |
+
|
| 125 |
+
probe = build_probe_from_config(
|
| 126 |
+
probe_config=probe_config,
|
| 127 |
+
base_model=base,
|
| 128 |
+
num_classes=10, # Your number of classes
|
| 129 |
+
device="cuda",
|
| 130 |
+
)
|
| 131 |
```
|
| 132 |
|
| 133 |
### Class Label Mapping
|
|
|
|
| 201 |
**BibTeX:**
|
| 202 |
|
| 203 |
```bibtex
|
| 204 |
+
@inproceedings{miron2025matters,
|
| 205 |
+
title={What Matters for Bioacoustic Encoding},
|
| 206 |
+
author={Miron, Marius and Robinson, David and Alizadeh, Milad and Gilsenan-McMahon, Ellen and Narula, Gagan and Chemla, Emmanuel and Cusimano, Maddie and Effenberger, Felix and Hagiwara, Masato and Hoffman, Benjamin and Keen, Sara and Kim, Diane and Lawton, Jane K. and Liu, Jen-Yu and Raskin, Aza and Pietquin, Olivier and Geist, Matthieu},
|
| 207 |
+
booktitle={The Fourteenth International Conference on Learning Representations},
|
| 208 |
+
year={2026}
|
| 209 |
}
|
| 210 |
```
|
| 211 |
|