orrp commited on
Commit
ffd7404
·
1 Parent(s): a0bbc38

File cleanup, Gradio queueing, made README minimal

Browse files
Files changed (4) hide show
  1. README.md +3 -185
  2. packages.txt +0 -1
  3. vampnet/app.py +1 -1
  4. vampnet/setup.py +0 -44
README.md CHANGED
@@ -9,195 +9,14 @@ hardware: a10g-small
9
  ---
10
 
11
  # WhAM: a Whale Acoustics Model
12
- [![arXiv](https://img.shields.io/badge/arXiv-2512.02206-b31b1b.svg)](https://arxiv.org/abs/2512.02206)
13
- [![Model Weights](https://img.shields.io/badge/Zenodo-Model%20Weights-blue.svg)](https://doi.org/10.5281/zenodo.17633708)
14
- [![Hugging Face Dataset](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-DSWP%20Dataset-yellow)](https://huggingface.co/datasets/orrp/DSWP)
15
- ![WhAM](assets/inference.png "WhAM")
16
- WhAM is a transformer-based audio-to-audio model designed to synthesize and analyze sperm whale codas. Based on [VampNet](https://github.com/hugofloresgarcia/vampnet), WhAM uses masked acoustic token modeling to capture temporal and spectral features of whale communication. WhAM generates codas from a given audio context, enabling three core capabilities:
17
 
18
- - Acoustic Translation: The ability to style-transfer arbitrary audio prompts (e.g., human speech, noise) into the acoustic texture of sperm whale codas.
19
 
20
- - Synthesizing novel "pseudocodas".
21
-
22
- - Providing audio embeddings for downstream tasks such as social unit and spectral feature ("vowel") classification.
23
-
24
- See our [NeurIPS 2025](https://openreview.net/pdf?id=IL1wvzOgqD) publication for more details.
25
-
26
- ## Installation
27
-
28
- 1. **Clone the repository:**
29
- ```bash
30
- git clone https://github.com/Project-CETI/wham.git
31
- cd wham
32
- ```
33
-
34
- 2. **Set up the environment:**
35
- ```bash
36
- conda create -n wham python=3.9
37
- conda activate wham
38
- ```
39
-
40
- 3. **Install dependencies:**
41
- ```bash
42
- # Install the wham package
43
- pip install -e .
44
-
45
- # Install VampNet
46
- pip install -e ./vampnet
47
-
48
- # Install madmom
49
- pip install --no-build-isolation madmom
50
-
51
- # Install ffmpeg
52
- conda install -c conda-forge ffmpeg
53
- ```
54
-
55
- 4. **Download model weights:**
56
- Download the [weights](https://zenodo.org/records/17633708) and extract to `vampnet/models/`.
57
-
58
- ## Generation
59
-
60
- To run WhAM locally and prompt it in your browser:
61
-
62
- ```bash
63
- python vampnet/app.py --args.load conf/interface.yml --Interface.device cuda
64
- ```
65
-
66
- This will provide you with a Gradio link to test WhAM on inputs of your choice.
67
-
68
- ## Training Data
69
-
70
- ![Training](assets/training.png "Training")
71
-
72
- You only need to follow these to fine-tune your own version of WhAM. First, obtain the original VampNet weights by following the instructions in the ![original repo](https://github.com/hugofloresgarcia/vampnet/tree/ismir-2023). Download
73
- c2f.pth and codec.pth and replace the weights you previously downloaded in `vampnet/models`.
74
-
75
- Second, obtain data:
76
-
77
- 1. **Domain adaptation data:**
78
-
79
- - Download audio samples from the [WMMS 'Best Of' Cut](https://whoicf2.whoi.edu/science/B/whalesounds/index.cfm). Save them under `vampnet/training_data/domain_adaptation`.
80
-
81
- - Download audio samples from the [BirdSet Dataset](https://huggingface.co/datasets/DBD-research-group/BirdSet). Save these under the same directory
82
-
83
- - Finally, download all samples from the [AudioSet Dataset](https://research.google.com/audioset/ontology/index.html) with the label `Animal` and once again save these into the directory
84
-
85
- 3. **Species-specific finetuning:** Finetuning can be performed on the openly available **[Dominica Sperm Whale Project (DSWP)](https://huggingface.co/datasets/orrp/DSWP)** dataset, available on Hugging Face.
86
-
87
-
88
- With data in hand, navigate into `vampnet` and perform Domain Adaptation:
89
- ```bash
90
- python vampnet/scripts/exp/fine_tune.py "training_data/domain_adaptation" domain_adapted && python vampnet/scripts/exp/train.py --args.load conf/generated/domain_adapted/coarse.yml && python vampnet/scripts/exp/train.py --args.load conf/generated/domain_adapted/c2f.yml
91
- ```
92
-
93
- Then fine-tune the domain-adapted model. Create the config file with the command:
94
-
95
- ```bash
96
- python vampnet/scripts/exp/fine_tune.py "training_data/species_specific_finetuning" fine-tuned
97
- ```
98
-
99
- To select which weights you want to use as a checkpoint, change `fine_tune_checkpoint` in `conf/generated/fine-tuned/[c2f/coarse].yml` to `./runs/domain_adaptation/[coarse/c2f]/[checkpoint]/vampnets/weights.pth`. `[checkpoint]` can be `latest` in order to use the last saved checkpoint from the previous run, though it is recommended to manually verify the quality of generations over various checkpoints as overtraining can often cause degradation in audio quality, especially with smaller datasets. After making that change, run the command:
100
-
101
- ```bash
102
- python vampnet/scripts/exp/train.py --args.load conf/generated/fine-tuned/coarse.yml && python vampnet/scripts/exp/train.py --args.load conf/generated/fine-tuned/c2f.yml
103
- ```
104
-
105
- After following these steps, you should be able to generate audio via the browser by running:
106
- ```bash
107
- python app.py --args.load vampnet/conf/generated/fine-tuned/interface.yml
108
- ```
109
-
110
- **Note**: The coarse and fine weights can be trained separately if compute allows. In this case, you would call the two scripts:
111
-
112
- ```bash
113
- python vampnet/scripts/exp/train.py --args.load conf/generated/[fine-tuned/domain_adaptated]/coarse.yml
114
- ```
115
-
116
- ```bash
117
- python vampnet/scripts/exp/train.py --args.load conf/generated/[fine-tuned/domain_adaptated]/c2f.yml
118
- ```
119
-
120
- After both are finished running, ensure that both resulting weights are copied into the same copy of WhAM.
121
-
122
-
123
-
124
- ## Testing Data
125
-
126
- 1. **Marine Mammel Data:**
127
- Download audio samples from the [WMMS 'Best Of' Cut](https://whoicf2.whoi.edu/science/B/whalesounds/index.cfm). Save them under `data/testing_data/marine_mammals/data/[SPECIES_NAME]`.
128
- * `[SPECIES_NAME]` must match the species names found in `wham/generation/prompt_configs.py`.
129
-
130
- 2. **Sperm Whale Codas:**
131
- To evaluate on sperm whale codas, you can use the openly available [DSWP](https://huggingface.co/datasets/orrp/DSWP) dataset.
132
-
133
- 3. Generate artifical beeps for experiments. `data/generate_beeps.sh`
134
-
135
-
136
- ## Reproducing Paper Results
137
- Note: Access to the DSWP+CETI annotated is required to reproduce all results; as of time of publication, only part of this data is publicly available. Still, we include the following code as it may be useful for researchers who may benefit from our evaluation pipeline.
138
-
139
- ### 1. Downstream Classification Tasks
140
- To reproduce **Table 1** (Classification Accuracies) and **Figure 7** (Ablation Study):
141
-
142
- **Table 1 Results:**
143
- ```bash
144
- cd wham/embedding
145
- ./downstream_tasks.sh
146
- ```
147
- * Runs all downstream classification tasks.
148
- * **Baselines:** Run once.
149
- * **Models (AVES, VampNet):** Run over 3 random seeds; reports mean and standard deviation.
150
-
151
- **Figure 7 Results (Ablation):**
152
- ```bash
153
- cd wham/embedding
154
- ./downstream_ablation.sh
155
- ```
156
- * Outputs accuracy scores for ablation variants (averaged across 3 seeds with error bars).
157
-
158
- ### 2. Generative Metrics
159
-
160
- **Figure 12: Frechet Audio Distance (FAD) Scores**
161
- Calculate the distance between WhAM's generated results and real codas:
162
- ```bash
163
- # Calculate for all species
164
- bash wham/generation/eval/calculate_FAD.sh
165
-
166
- # Calculate for a single species
167
- bash wham/generation/eval/calculate_FAD.sh [species_name]
168
- ```
169
- * *Runtime:* ~3 hours on an NVIDIA A10 GPU.
170
-
171
- **Figure 3: FAD with Custom/BirdNET Embeddings**
172
- To compare against other embeddings:
173
- 1. Convert your `.wav` files to `.npy` embeddings.
174
- 2. Place raw coda embeddings in: `data/testing_data/coda_embeddings`
175
- 3. Place comparison embeddings in subfolders within: `data/testing_data/comparison_embeddings`
176
- 4. Run:
177
- ```bash
178
- python wham/generation/eval/calculate_custom_fad.py
179
- ```
180
- *For BirdNET embeddings, refer to the [official repo](https://github.com/BirdNET-Team/BirdNET-Analyzer).*
181
-
182
- **Table 2: Embedding Type Ablation**
183
- Calculate distances between raw codas, denoised versions, and noise profiles:
184
- ```bash
185
- bash wham/generation/eval/FAD_ablation.sh
186
- ```
187
- * *Prerequisites:* Ensure `data/testing_data/ablation/noise` and `data/testing_data/ablation/denoised` are populated.
188
- * *Runtime:* ~1.5 hours on an NVIDIA A10 GPU.
189
-
190
- **Figure 13: Tokenizer Reconstruction**
191
- Test the mean squared reconstruction error:
192
- ```bash
193
- bash wham/generation/eval/evaluate_tokenizer.sh
194
- ```
195
-
196
- ---
197
 
198
  ## Citation
199
 
200
- Please use the following citation if you use this code, model or data.
201
 
202
  ```bibtex
203
  @inproceedings{wham2025,
@@ -207,4 +26,3 @@ Please use the following citation if you use this code, model or data.
207
  on Neural Information Processing Systems 2025, NeurIPS 2025, San Diego, CA, USA},
208
  year={2025}
209
  }
210
- ```
 
9
  ---
10
 
11
  # WhAM: a Whale Acoustics Model
 
 
 
 
 
12
 
13
+ WhAM is a transformer-based audio-to-audio model designed to synthesize and analyze sperm whale codas. It uses masked acoustic token modeling to capture the unique temporal and spectral features of whale communication.
14
 
15
+ For full technical details, installation instructions, and training scripts, please visit the official repository at [https://github.com/Project-CETI/wham].
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
 
17
  ## Citation
18
 
19
+ If you use this code, model, or data in your research, please cite our [NeurIPS 2025](https://openreview.net/pdf?id=IL1wvzOgqD) publication:
20
 
21
  ```bibtex
22
  @inproceedings{wham2025,
 
26
  on Neural Information Processing Systems 2025, NeurIPS 2025, San Diego, CA, USA},
27
  year={2025}
28
  }
 
packages.txt DELETED
@@ -1 +0,0 @@
1
- ffmpeg
 
 
vampnet/app.py CHANGED
@@ -691,4 +691,4 @@ with gr.Blocks() as demo:
691
  )
692
 
693
 
694
- demo.launch(share=True, debug=True, allowed_paths=[SCRIPT_DIR / "assets"])
 
691
  )
692
 
693
 
694
+ demo.queue().launch(allowed_paths=[SCRIPT_DIR / "assets"])
vampnet/setup.py DELETED
@@ -1,44 +0,0 @@
1
- from setuptools import find_packages, setup
2
-
3
- with open("README.md") as f:
4
- long_description = f.read()
5
-
6
- setup(
7
- name="vampnet",
8
- version="0.0.1",
9
- classifiers=[
10
- "Intended Audience :: Developers",
11
- "Natural Language :: English",
12
- "Programming Language :: Python :: 3.7",
13
- "Topic :: Artistic Software",
14
- "Topic :: Multimedia",
15
- "Topic :: Multimedia :: Sound/Audio",
16
- "Topic :: Multimedia :: Sound/Audio :: Editors",
17
- "Topic :: Software Development :: Libraries",
18
- ],
19
- description="Generative Music Modeling.",
20
- long_description=long_description,
21
- long_description_content_type="text/markdown",
22
- author="Hugo Flores García, Prem Seetharaman",
23
- author_email="hfgacrcia@descript.com",
24
- url="https://github.com/hugofloresgarcia/vampnet",
25
- license="MIT",
26
- packages=find_packages(),
27
- setup_requires=[
28
- "Cython",
29
- ],
30
- install_requires=[
31
- "Cython", # Added by WhAM because it seems to be needed by this repo?
32
- "torch",
33
- "pydantic==2.10.6",
34
- "argbind>=0.3.2",
35
- "numpy<1.24",
36
- "wavebeat @ git+https://github.com/hugofloresgarcia/wavebeat",
37
- "lac @ git+https://github.com/hugofloresgarcia/lac.git",
38
- "descript-audiotools @ git+https://github.com/hugofloresgarcia/audiotools.git",
39
- "gradio",
40
- "loralib",
41
- "torch_pitch_shift",
42
- "pyharp",
43
- ],
44
- )