ml-intern
hello9972's picture
Update ML Intern artifact metadata
d837417 verified
---
tags:
- ml-intern
---
# BirdCLEF+ 2026 β€” Improved Pipeline (Target: 0.90+)
This repository contains an improved 4-notebook pipeline for BirdCLEF+ 2026, based on lessons learned from a 0.815 score baseline.
πŸ”— **Competition**: https://www.kaggle.com/competitions/birdclef-2026
πŸ”— **Author Repo**: https://huggingface.co/hello9972/birdclef-2026-improved
---
## Why You Stuck at 0.815
Your original pipeline had these fatal problems that prevented reaching 0.90+:
### ❌ What Destroyed Your Score
| Mistake | Impact | Why |
|---------|--------|-----|
| **Threshold boosting** (`p * 0.85 + mask * 0.15`) | 0.815 β†’ **0.52** | Thresholds destroy probability ranking. BirdCLEF metric is AUC-based (rank-sensitive). |
| **Mixup + Label Smoothing** | Softened outputs to 0.05-0.2 range | Destroyed calibration needed for AUC. AUC needs spread, not softening. |
| **Aggressive calibration** (`p ** 0.75`) | 0.815 β†’ **0.53** | Non-linear transforms distort ranking order. |
| **2-model ensemble only** | Ceiling ~0.82 | Top solutions use 5-20 models. |
| **No 5-fold CV** | Could not ensemble diverse models | Same data, same predictions = no ensemble gain. |
| **No pseudo-labeling** | Missing 5-8% boost from test-domain adaptation | Top solutions use noisy student on test predictions. |
### βœ… What Actually Works for BirdCLEF
- **Raw sigmoid outputs** β€” NO thresholds, NO calibration
- **Simple ensemble** β€” mean logits, not probabilities
- **Exact sample submission alignment** β€” `sample[["row_id"]].merge(sub, ...)`
- **Pure PyTorch inference** β€” No ONNX in Kaggle submissions
- **Minimal post-processing** β€” tiny clip only
---
## New Architecture Overview
```
NB1 β†’ Data Prep + StratifiedKFold(5)
NB2 β†’ 5-Fold Training (AsymmetricLoss, SpecAugment, Energy Crop)
NB3 β†’ Pseudo-Labeling (Noisy Student on train_soundscapes)
NB4 β†’ Inference (10-model ensemble, TTA, rank averaging)
```
---
## Key Improvements
### 1. Loss Function: AsymmetricLoss (NOT BCE)
Replaces `BCEWithLogitsLoss` with AsymmetricLoss from [arXiv:2009.14119](https://arxiv.org/abs/2009.14119):
```python
class AsymmetricLoss(nn.Module):
def __init__(self, gamma_neg=4, gamma_pos=0, clip=0.05):
...
```
**Why**: Down-weights easy negatives (background noise, empty segments) while preserving signal for rare species. Does NOT squash logits like label smoothing.
### 2. Energy-Based Window Selection (Perch 2.0 Trick)
For training clips longer than 5 seconds, finds the window with highest audio energy instead of random crop:
```python
def _energy_crop(self, wav):
energy = librosa.feature.rms(y=wav, frame_length=2048, hop_length=512)[0]
peak_frame = np.argmax(smoothed_energy)
# center window around peak with jitter
```
**Why**: Bird calls are often brief. Random crops miss them. Energy-based crops hit them 3-4x more often.
### 3. Augmentations (Waveform + Spectrogram)
| Augmentation | Level | Purpose |
|---------------|-------|---------|
| Cyclic roll | 100% | Time-shift invariance |
| Colored noise | 30% | SNR 3-30dB, f^-decay | Domain adaptation to soundscapes |
| Background noise | 50% | Real soundscape mixing | Simulates multi-species recordings |
| Gain | 30% | Β±12dB | Loudness invariance |
| SpecAugment (freq mask) | 50% | 24 bins | Frequency invariance |
| SpecAugment (time mask) | 50% | 40 frames | Time invariance |
**NO mixup. NO label smoothing.** Both destroyed your score.
### 4. 5-Fold StratifiedKFold
```python
skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
```
Each fold gets the same species distribution. 5 diverse models = 5x ensemble power.
### 5. Layer-Wise LR Decay
```python
lr_scale = layer_decay ** (num_blocks - layer_idx)
```
Deeper layers (closer to input) get smaller LR. Prevents overfitting on early layers.
### 6. Test-Time Augmentation (TTA)
4 variants per chunk:
- Original
- Time-reversed
- +3dB gain
- -3dB gain
Average logits across all variants.
### 7. Pseudo-Labeling (Noisy Student)
Use confident predictions (>0.5) on `train_soundscapes` as additional training data. Retrain with these pseudo-labels + original data.
**Expected boost**: 0.84 β†’ 0.88
---
## Expected Score Improvements
| Stage | Technique | Expected Score |
|-------|-----------|----------------|
| Baseline | Your 0.815 pipeline | **0.815** |
| NB2 improvement | AsymmetricLoss + energy crop + no mixup | **0.83-0.85** |
| 5-fold ensemble | 10 models (5 folds Γ— 2 backbones) | **0.85-0.87** |
| TTA | 4 variants per chunk | **0.86-0.88** |
| Pseudo-labeling | Noisy student on soundscapes | **0.88-0.91** |
| + Better backbone | Bird-MAE or ConvNeXt | **0.90-0.93** |
---
## Files
| File | Purpose |
|------|---------|
| `nb01_data_prep.py` | Data cleaning, VAD, StratifiedKFold(5) |
| `nb02_training.py` | 5-fold training with AsymmetricLoss, SpecAugment |
| `nb03_pseudo_labeling.py` | Generate pseudo-labels, noisy student |
| `nb04_inference.py` | 10-model ensemble, TTA, submission generation |
---
## How to Run on Kaggle
### Step 1: Create Dataset from NB1 Output
After running `nb01_data_prep.py`, create a Kaggle dataset from `/kaggle/working/`:
```
train_cleaned_stratified.csv
soundscape_labels_with_folds.csv
species_list.csv
rare_species.csv
```
### Step 2: NB2 Training
```python
# In Kaggle notebook, attach:
# - Competition data: birdclef-2026
# - NB1 output dataset
# Run nb02_training.py β†’ produces 10 .pt files in /kaggle/working/models/
```
Save models as a new Kaggle dataset.
### Step 3: NB3 Pseudo-Labeling
```python
# Attach NB2 model dataset + NB1 data
# Run nb03_pseudo_labeling.py β†’ produces pseudo_labels_soft.csv
```
### Step 4: NB4 Inference (Submission)
```python
# Attach NB2 model dataset + competition test data
# Run nb04_inference.py β†’ produces submission.csv
```
---
## Critical Rules for BirdCLEF
1. **NEVER threshold predictions** β€” It destroys AUC ranking.
2. **NEVER apply non-linear calibration** (`p**0.75`, `p/(p+1)`, etc.) β€” It distorts rank order.
3. **NEVER mixup or label-smooth** β€” It squashes logits into a narrow range, killing AUC spread.
4. **ALWAYS align submission with sample_submission.csv** β€” `sample[["row_id"]].merge(sub, ...)`
5. **ALWAYS ensemble diverse models** β€” Same model, same folds = no gain.
6. **ALWAYS use raw sigmoid outputs** β€” Let the metric handle calibration.
---
## References
- AsymmetricLoss: [arXiv:2009.14119](https://arxiv.org/abs/2009.14119)
- Bird-MAE: [arXiv:2504.12880](https://arxiv.org/abs/2504.12880)
- sl-BEATs: [arXiv:2508.11845](https://arxiv.org/abs/2508.11845)
- Top solution reference: [minalkharat12/birdclef-2026-solution](https://huggingface.co/minalkharat12/birdclef-2026-solution)
---
## License
MIT β€” Competition code for educational purposes.
<!-- ml-intern-provenance -->
## Generated by ML Intern
This model repository was generated by [ML Intern](https://github.com/huggingface/ml-intern), an agent for machine learning research and development on the Hugging Face Hub.
- Try ML Intern: https://smolagents-ml-intern.hf.space
- Source code: https://github.com/huggingface/ml-intern
## Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "hello9972/birdclef-2026-improved"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
```
For non-causal architectures, replace `AutoModelForCausalLM` with the appropriate `AutoModel` class.