Update paper title to current Sci Rep revision; remove internal version numbers
Browse filesTitle now matches manuscript: "Discovery and promotion of unknown sounds into operational detection targets for underwater passive acoustic monitoring under false alarm constraints". Revision label simplified to "revision in review" since this is the first author-side revision at Scientific Reports.
README.md
CHANGED
|
@@ -25,21 +25,22 @@ via Domain-Adaptive Pretraining (DAPT) on a 5,673-h global ocean
|
|
| 25 |
soundscape corpus (World-DAPT).
|
| 26 |
|
| 27 |
This model serves as the "ears" for underwater soundscapes described in our
|
| 28 |
-
paper: **"
|
| 29 |
-
|
|
|
|
| 30 |
|
| 31 |
-
> **About
|
| 32 |
> SimCLR/InfoNCE-based DAPT under AMP fp16, which suffered a numerical
|
| 33 |
> instability that prevented BEATs encoder weight updates (the
|
| 34 |
> `beats_dapt_topup_encoder.pt` weights were therefore byte-identical to
|
| 35 |
-
> Microsoft's BEATs AS-2M PRETRAIN).
|
| 36 |
-
> DAPT with **Masked Audio Modeling (MAM)** and a **k-means
|
| 37 |
-
> under bfloat16 precision on a larger 5,673-h
|
| 38 |
-
> superseded buggy weights have been **removed**
|
| 39 |
-
> *Reproducibility of the original buggy state*
|
| 40 |
-
> them if needed).
|
| 41 |
|
| 42 |
-
## Model Details (
|
| 43 |
|
| 44 |
- **Architecture:** BEATs (Audio Transformer; Microsoft)
|
| 45 |
- **Self-supervised pretraining:** Masked Audio Modeling (MAM) with k=1024
|
|
@@ -52,7 +53,7 @@ soundscapes"** (*Scientific Reports*, **revision 2.2** in review).
|
|
| 52 |
- **Input:** 16 kHz mono waveform
|
| 53 |
- **Backbone init:** BEATs AS-2M (iter3+)
|
| 54 |
|
| 55 |
-
## Available files (
|
| 56 |
|
| 57 |
| File | SHA-256 | Size |
|
| 58 |
|---|---|---|
|
|
@@ -66,7 +67,7 @@ above encoder. Single-seed Event F1 = 0.483; n=10 mean ± std = 0.475 ± 0.017
|
|
| 66 |
## Reproducibility of the original (buggy) state
|
| 67 |
|
| 68 |
The original December 2026 release contained two files that have been
|
| 69 |
-
removed in
|
| 70 |
|
| 71 |
| Removed file | Replacement / how to recreate |
|
| 72 |
|---|---|
|
|
@@ -83,12 +84,11 @@ prevented any weight updates.
|
|
| 83 |
These weights are designed to be used with the official code repository:
|
| 84 |
|
| 85 |
**GitHub Repository:** [alohajazz/openworld-soundscape-cced2-dgpu](https://github.com/alohajazz/openworld-soundscape-cced2-dgpu)
|
| 86 |
-
(see branch `revision-2.2-restructure` until merged into `main`)
|
| 87 |
|
| 88 |
```python
|
| 89 |
from huggingface_hub import hf_hub_download
|
| 90 |
|
| 91 |
-
# Download canonical revision
|
| 92 |
encoder_path = hf_hub_download(
|
| 93 |
repo_id="BiologgingSolutions/OceanBEATs",
|
| 94 |
filename="beats_dapt_mam_step120000.pt",
|
|
@@ -129,11 +129,11 @@ the released weights does not grant any rights under those patents.
|
|
| 129 |
If you use this model in your research, please cite our paper:
|
| 130 |
|
| 131 |
```bibtex
|
| 132 |
-
@article{
|
| 133 |
-
title={
|
| 134 |
-
author={Noda, Takuji and Koizumi, Takuya
|
| 135 |
journal={Scientific Reports},
|
| 136 |
-
note={Revision
|
| 137 |
year={2026}
|
| 138 |
}
|
| 139 |
```
|
|
@@ -147,8 +147,8 @@ ignored `center_sec`, returning per-file constant embeddings) was discovered
|
|
| 147 |
and fixed on 2026-05-08. The fix affects only the **extraction code** in the
|
| 148 |
GitHub repository — **encoder weights in this repository are byte-identical
|
| 149 |
before and after the fix** (the bug occurred downstream of the encoder
|
| 150 |
-
forward pass). All
|
| 151 |
-
re-computed with the corrected window-aware extractor; updated paper
|
| 152 |
artifacts are tracked under
|
| 153 |
[`paper_artifacts/winaware_2026-05-09/`](https://github.com/alohajazz/openworld-soundscape-cced2-dgpu/tree/main/paper_artifacts/winaware_2026-05-09)
|
| 154 |
and
|
|
@@ -158,9 +158,9 @@ strict 0–8 kHz in-band consistency (Nyquist of the 16-kHz BEATs input);
|
|
| 158 |
species whose dominant call energy lies above 8 kHz are listed in the
|
| 159 |
GitHub `REVISION2.md`. SHA-256 fingerprints of
|
| 160 |
`beats_dapt_mam_step120000.pt` and `sed_head_56_fulldata_ep8.pt` are
|
| 161 |
-
unchanged from the revision
|
| 162 |
|
| 163 |
-
###
|
| 164 |
- DAPT method changed from SimCLR/InfoNCE to Masked Audio Modeling (MAM)
|
| 165 |
with k-means k=1024 tokeniser; precision changed from AMP fp16 to bfloat16
|
| 166 |
(corrects the original numerical instability that prevented weight updates)
|
|
|
|
| 25 |
soundscape corpus (World-DAPT).
|
| 26 |
|
| 27 |
This model serves as the "ears" for underwater soundscapes described in our
|
| 28 |
+
paper: **"Discovery and promotion of unknown sounds into operational
|
| 29 |
+
detection targets for underwater passive acoustic monitoring under false
|
| 30 |
+
alarm constraints"** (*Scientific Reports*, **revision** in review).
|
| 31 |
|
| 32 |
+
> **About this revision** (May 2026). The original December 2026 release used
|
| 33 |
> SimCLR/InfoNCE-based DAPT under AMP fp16, which suffered a numerical
|
| 34 |
> instability that prevented BEATs encoder weight updates (the
|
| 35 |
> `beats_dapt_topup_encoder.pt` weights were therefore byte-identical to
|
| 36 |
+
> Microsoft's BEATs AS-2M PRETRAIN). The current revision corrects this by
|
| 37 |
+
> re-running DAPT with **Masked Audio Modeling (MAM)** and a **k-means
|
| 38 |
+
> k=1024 tokeniser** under bfloat16 precision on a larger 5,673-h
|
| 39 |
+
> World-DAPT corpus. The superseded buggy weights have been **removed**
|
| 40 |
+
> from this repository (see *Reproducibility of the original buggy state*
|
| 41 |
+
> below for how to recreate them if needed).
|
| 42 |
|
| 43 |
+
## Model Details (current canonical revision)
|
| 44 |
|
| 45 |
- **Architecture:** BEATs (Audio Transformer; Microsoft)
|
| 46 |
- **Self-supervised pretraining:** Masked Audio Modeling (MAM) with k=1024
|
|
|
|
| 53 |
- **Input:** 16 kHz mono waveform
|
| 54 |
- **Backbone init:** BEATs AS-2M (iter3+)
|
| 55 |
|
| 56 |
+
## Available files (current canonical revision)
|
| 57 |
|
| 58 |
| File | SHA-256 | Size |
|
| 59 |
|---|---|---|
|
|
|
|
| 67 |
## Reproducibility of the original (buggy) state
|
| 68 |
|
| 69 |
The original December 2026 release contained two files that have been
|
| 70 |
+
removed in the current revision:
|
| 71 |
|
| 72 |
| Removed file | Replacement / how to recreate |
|
| 73 |
|---|---|
|
|
|
|
| 84 |
These weights are designed to be used with the official code repository:
|
| 85 |
|
| 86 |
**GitHub Repository:** [alohajazz/openworld-soundscape-cced2-dgpu](https://github.com/alohajazz/openworld-soundscape-cced2-dgpu)
|
|
|
|
| 87 |
|
| 88 |
```python
|
| 89 |
from huggingface_hub import hf_hub_download
|
| 90 |
|
| 91 |
+
# Download canonical revision weights
|
| 92 |
encoder_path = hf_hub_download(
|
| 93 |
repo_id="BiologgingSolutions/OceanBEATs",
|
| 94 |
filename="beats_dapt_mam_step120000.pt",
|
|
|
|
| 129 |
If you use this model in your research, please cite our paper:
|
| 130 |
|
| 131 |
```bibtex
|
| 132 |
+
@article{noda2026discovery,
|
| 133 |
+
title={Discovery and promotion of unknown sounds into operational detection targets for underwater passive acoustic monitoring under false alarm constraints},
|
| 134 |
+
author={Noda, Takuji and Koizumi, Takuya},
|
| 135 |
journal={Scientific Reports},
|
| 136 |
+
note={Revision, in review},
|
| 137 |
year={2026}
|
| 138 |
}
|
| 139 |
```
|
|
|
|
| 147 |
and fixed on 2026-05-08. The fix affects only the **extraction code** in the
|
| 148 |
GitHub repository — **encoder weights in this repository are byte-identical
|
| 149 |
before and after the fix** (the bug occurred downstream of the encoder
|
| 150 |
+
forward pass). All current-revision result tables (Tables 2/3/4 and Fig 3)
|
| 151 |
+
were re-computed with the corrected window-aware extractor; updated paper
|
| 152 |
artifacts are tracked under
|
| 153 |
[`paper_artifacts/winaware_2026-05-09/`](https://github.com/alohajazz/openworld-soundscape-cced2-dgpu/tree/main/paper_artifacts/winaware_2026-05-09)
|
| 154 |
and
|
|
|
|
| 158 |
species whose dominant call energy lies above 8 kHz are listed in the
|
| 159 |
GitHub `REVISION2.md`. SHA-256 fingerprints of
|
| 160 |
`beats_dapt_mam_step120000.pt` and `sed_head_56_fulldata_ep8.pt` are
|
| 161 |
+
unchanged from the current revision listed in the table above.
|
| 162 |
|
| 163 |
+
### Current revision (May 2026)
|
| 164 |
- DAPT method changed from SimCLR/InfoNCE to Masked Audio Modeling (MAM)
|
| 165 |
with k-means k=1024 tokeniser; precision changed from AMP fp16 to bfloat16
|
| 166 |
(corrects the original numerical instability that prevented weight updates)
|