nkundiushuti commited on
Commit
8ffd934
·
verified ·
1 Parent(s): 26512fc

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +64 -19
README.md CHANGED
@@ -1,11 +1,11 @@
1
  ---
2
  license: cc-by-nc-sa-4.0
3
  tags:
4
- - EarthSpeciesProject
5
- - AVEX
6
- - Bioacoustics
7
- - RepresentationLearning
8
- - EfficientNet
9
  ---
10
 
11
  # Model Card for esp-aves2-effnetb0-bio
@@ -25,9 +25,9 @@ esp-aves2-effnetb0-bio is a **supervised bioacoustic encoder** trained to produc
25
 
26
  ### Model Sources
27
 
28
- - **Repository:** `https://github.com/earthspecies/esp-avex`
29
  - **Paper:** [What Matters for Bioacoustic Encoding](https://arxiv.org/abs/2508.11845)
30
- - **Hugging Face Model:** `TBA`
31
  - **Configuration:** [train_config.yaml](train_config.yaml)
32
 
33
  ### Parent Models
@@ -59,30 +59,75 @@ Not a generative model; does not output text.
59
 
60
  ## How to Get Started with the Model
61
 
62
- Loading this model requires the AVEX (Animal Vocalization Encoder) library `esp-avex` to be installed.
63
 
64
  ### Installation
65
 
66
- See [https://github.com/earthspecies/esp-avex](https://github.com/earthspecies/esp-avex) for installation instructions.
 
 
 
 
 
 
 
 
 
 
67
 
68
  ### Loading the Model
69
 
70
  ```python
71
  from avex import load_model
72
 
73
- model = load_model("esp-aves2-effnetb0-bio", device="cuda")
74
  ```
75
 
76
  ### Using the Model
77
 
78
  ```python
79
  # Case 1: embedding extraction (features only)
80
- backbone = load_model("esp-aves2-effnetb0-bio", device="cuda", return_features_only=True)
81
- embeddings = backbone(audio_tensor)
 
 
 
 
 
 
82
 
83
  # Case 2: supervised predictions (logits over label IDs; see label_map.json)
84
- model = load_model("esp-aves2-effnetb0-bio", device="cuda")
85
- logits = model(audio_tensor)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
86
  ```
87
 
88
  ### Class Label Mapping
@@ -156,11 +201,11 @@ Aggregate results for linear probing (frozen base model) with esp-aves2-effnetb0
156
  **BibTeX:**
157
 
158
  ```bibtex
159
- @article{miron2025whatmattersbioacoustic,
160
- title={What Matters for Bioacoustic Encoding},
161
- author={Miron, Marius and Robinson, David and Alizadeh, Milad and Gilsenan-McMahon, Ellen and Narula, Gagan and Chemla, Emmanuel and Cusimano, Maddie and Effenberger, Felix and Hagiwara, Masato and Hoffman, Benjamin and Keen, Sara and Kim, Diane and Lawton, Jane K. and Liu, Jen-Yu and Raskin, Aza and Pietquin, Olivier and Geist, Matthieu},
162
- journal={arXiv preprint arXiv:2508.11845},
163
- year={2025}
164
  }
165
  ```
166
 
 
1
  ---
2
  license: cc-by-nc-sa-4.0
3
  tags:
4
+ - EarthSpeciesProject
5
+ - AVEX
6
+ - Bioacoustics
7
+ - RepresentationLearning
8
+ - EfficientNet
9
  ---
10
 
11
  # Model Card for esp-aves2-effnetb0-bio
 
25
 
26
  ### Model Sources
27
 
28
+ - **Repository:** `https://github.com/earthspecies/avex`
29
  - **Paper:** [What Matters for Bioacoustic Encoding](https://arxiv.org/abs/2508.11845)
30
+ - **Hugging Face Model:** [ESP-AVES2 Collection](https://huggingface.co/collections/EarthSpeciesProject/esp-aves2)
31
  - **Configuration:** [train_config.yaml](train_config.yaml)
32
 
33
  ### Parent Models
 
59
 
60
  ## How to Get Started with the Model
61
 
62
+ Loading this model requires the AVEX (Animal Vocalization Encoder) library `avex` to be installed.
63
 
64
  ### Installation
65
 
66
+ ```bash
67
+ pip install avex
68
+ ```
69
+
70
+ Or with uv:
71
+
72
+ ```bash
73
+ uv add avex
74
+ ```
75
+
76
+ For more details, see [https://github.com/earthspecies/avex](https://github.com/earthspecies/avex).
77
 
78
  ### Loading the Model
79
 
80
  ```python
81
  from avex import load_model
82
 
83
+ model = load_model("esp_aves2_effnetb0_bio", device="cuda")
84
  ```
85
 
86
  ### Using the Model
87
 
88
  ```python
89
  # Case 1: embedding extraction (features only)
90
+ backbone = load_model("esp_aves2_effnetb0_bio", device="cuda", return_features_only=True)
91
+
92
+ with torch.no_grad():
93
+ embeddings = backbone(audio_tensor)
94
+ # Shape: (batch, channels, height, width) for EfficientNet
95
+
96
+ # Pool to get fixed-size embedding
97
+ embedding = embeddings.mean(dim=(2, 3)) # Shape: (batch, channels)
98
 
99
  # Case 2: supervised predictions (logits over label IDs; see label_map.json)
100
+ model = load_model("esp_aves2_effnetb0_bio", device="cuda")
101
+
102
+ with torch.no_grad():
103
+ logits = model(audio_tensor)
104
+ predicted_class = logits.argmax(dim=-1).item()
105
+ ```
106
+
107
+ ### Transfer Learning with Probes
108
+
109
+ ```python
110
+ from avex.models.probes import build_probe_from_config
111
+ from avex.configs import ProbeConfig
112
+
113
+ # Load backbone for feature extraction
114
+ base = load_model("esp_aves2_effnetb0_bio", return_features_only=True, device="cuda")
115
+
116
+ # Define a probe head for your task
117
+ probe_config = ProbeConfig(
118
+ probe_type="linear",
119
+ target_layers=["last_layer"],
120
+ aggregation="mean",
121
+ freeze_backbone=True,
122
+ online_training=True,
123
+ )
124
+
125
+ probe = build_probe_from_config(
126
+ probe_config=probe_config,
127
+ base_model=base,
128
+ num_classes=10, # Your number of classes
129
+ device="cuda",
130
+ )
131
  ```
132
 
133
  ### Class Label Mapping
 
201
  **BibTeX:**
202
 
203
  ```bibtex
204
+ @inproceedings{miron2025matters,
205
+ title={What Matters for Bioacoustic Encoding},
206
+ author={Miron, Marius and Robinson, David and Alizadeh, Milad and Gilsenan-McMahon, Ellen and Narula, Gagan and Chemla, Emmanuel and Cusimano, Maddie and Effenberger, Felix and Hagiwara, Masato and Hoffman, Benjamin and Keen, Sara and Kim, Diane and Lawton, Jane K. and Liu, Jen-Yu and Raskin, Aza and Pietquin, Olivier and Geist, Matthieu},
207
+ booktitle={The Fourteenth International Conference on Learning Representations},
208
+ year={2026}
209
  }
210
  ```
211