| --- |
| tags: |
| - ml-intern |
| --- |
| # BirdCLEF+ 2026 β Improved Pipeline (Target: 0.90+) |
|
|
| This repository contains an improved 4-notebook pipeline for BirdCLEF+ 2026, based on lessons learned from a 0.815 score baseline. |
|
|
| π **Competition**: https://www.kaggle.com/competitions/birdclef-2026 |
| π **Author Repo**: https://huggingface.co/hello9972/birdclef-2026-improved |
|
|
| --- |
|
|
| ## Why You Stuck at 0.815 |
|
|
| Your original pipeline had these fatal problems that prevented reaching 0.90+: |
|
|
| ### β What Destroyed Your Score |
|
|
| | Mistake | Impact | Why | |
| |---------|--------|-----| |
| | **Threshold boosting** (`p * 0.85 + mask * 0.15`) | 0.815 β **0.52** | Thresholds destroy probability ranking. BirdCLEF metric is AUC-based (rank-sensitive). | |
| | **Mixup + Label Smoothing** | Softened outputs to 0.05-0.2 range | Destroyed calibration needed for AUC. AUC needs spread, not softening. | |
| | **Aggressive calibration** (`p ** 0.75`) | 0.815 β **0.53** | Non-linear transforms distort ranking order. | |
| | **2-model ensemble only** | Ceiling ~0.82 | Top solutions use 5-20 models. | |
| | **No 5-fold CV** | Could not ensemble diverse models | Same data, same predictions = no ensemble gain. | |
| | **No pseudo-labeling** | Missing 5-8% boost from test-domain adaptation | Top solutions use noisy student on test predictions. | |
|
|
| ### β
What Actually Works for BirdCLEF |
|
|
| - **Raw sigmoid outputs** β NO thresholds, NO calibration |
| - **Simple ensemble** β mean logits, not probabilities |
| - **Exact sample submission alignment** β `sample[["row_id"]].merge(sub, ...)` |
| - **Pure PyTorch inference** β No ONNX in Kaggle submissions |
| - **Minimal post-processing** β tiny clip only |
|
|
| --- |
|
|
| ## New Architecture Overview |
|
|
| ``` |
| NB1 β Data Prep + StratifiedKFold(5) |
| NB2 β 5-Fold Training (AsymmetricLoss, SpecAugment, Energy Crop) |
| NB3 β Pseudo-Labeling (Noisy Student on train_soundscapes) |
| NB4 β Inference (10-model ensemble, TTA, rank averaging) |
| ``` |
|
|
| --- |
|
|
| ## Key Improvements |
|
|
| ### 1. Loss Function: AsymmetricLoss (NOT BCE) |
|
|
| Replaces `BCEWithLogitsLoss` with AsymmetricLoss from [arXiv:2009.14119](https://arxiv.org/abs/2009.14119): |
|
|
| ```python |
| class AsymmetricLoss(nn.Module): |
| def __init__(self, gamma_neg=4, gamma_pos=0, clip=0.05): |
| ... |
| ``` |
|
|
| **Why**: Down-weights easy negatives (background noise, empty segments) while preserving signal for rare species. Does NOT squash logits like label smoothing. |
|
|
| ### 2. Energy-Based Window Selection (Perch 2.0 Trick) |
|
|
| For training clips longer than 5 seconds, finds the window with highest audio energy instead of random crop: |
|
|
| ```python |
| def _energy_crop(self, wav): |
| energy = librosa.feature.rms(y=wav, frame_length=2048, hop_length=512)[0] |
| peak_frame = np.argmax(smoothed_energy) |
| # center window around peak with jitter |
| ``` |
|
|
| **Why**: Bird calls are often brief. Random crops miss them. Energy-based crops hit them 3-4x more often. |
|
|
| ### 3. Augmentations (Waveform + Spectrogram) |
|
|
| | Augmentation | Level | Purpose | |
| |---------------|-------|---------| |
| | Cyclic roll | 100% | Time-shift invariance | |
| | Colored noise | 30% | SNR 3-30dB, f^-decay | Domain adaptation to soundscapes | |
| | Background noise | 50% | Real soundscape mixing | Simulates multi-species recordings | |
| | Gain | 30% | Β±12dB | Loudness invariance | |
| | SpecAugment (freq mask) | 50% | 24 bins | Frequency invariance | |
| | SpecAugment (time mask) | 50% | 40 frames | Time invariance | |
|
|
| **NO mixup. NO label smoothing.** Both destroyed your score. |
|
|
| ### 4. 5-Fold StratifiedKFold |
|
|
| ```python |
| skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42) |
| ``` |
|
|
| Each fold gets the same species distribution. 5 diverse models = 5x ensemble power. |
|
|
| ### 5. Layer-Wise LR Decay |
|
|
| ```python |
| lr_scale = layer_decay ** (num_blocks - layer_idx) |
| ``` |
|
|
| Deeper layers (closer to input) get smaller LR. Prevents overfitting on early layers. |
|
|
| ### 6. Test-Time Augmentation (TTA) |
|
|
| 4 variants per chunk: |
| - Original |
| - Time-reversed |
| - +3dB gain |
| - -3dB gain |
|
|
| Average logits across all variants. |
|
|
| ### 7. Pseudo-Labeling (Noisy Student) |
|
|
| Use confident predictions (>0.5) on `train_soundscapes` as additional training data. Retrain with these pseudo-labels + original data. |
|
|
| **Expected boost**: 0.84 β 0.88 |
|
|
| --- |
|
|
| ## Expected Score Improvements |
|
|
| | Stage | Technique | Expected Score | |
| |-------|-----------|----------------| |
| | Baseline | Your 0.815 pipeline | **0.815** | |
| | NB2 improvement | AsymmetricLoss + energy crop + no mixup | **0.83-0.85** | |
| | 5-fold ensemble | 10 models (5 folds Γ 2 backbones) | **0.85-0.87** | |
| | TTA | 4 variants per chunk | **0.86-0.88** | |
| | Pseudo-labeling | Noisy student on soundscapes | **0.88-0.91** | |
| | + Better backbone | Bird-MAE or ConvNeXt | **0.90-0.93** | |
|
|
| --- |
|
|
| ## Files |
|
|
| | File | Purpose | |
| |------|---------| |
| | `nb01_data_prep.py` | Data cleaning, VAD, StratifiedKFold(5) | |
| | `nb02_training.py` | 5-fold training with AsymmetricLoss, SpecAugment | |
| | `nb03_pseudo_labeling.py` | Generate pseudo-labels, noisy student | |
| | `nb04_inference.py` | 10-model ensemble, TTA, submission generation | |
|
|
| --- |
|
|
| ## How to Run on Kaggle |
|
|
| ### Step 1: Create Dataset from NB1 Output |
|
|
| After running `nb01_data_prep.py`, create a Kaggle dataset from `/kaggle/working/`: |
|
|
| ``` |
| train_cleaned_stratified.csv |
| soundscape_labels_with_folds.csv |
| species_list.csv |
| rare_species.csv |
| ``` |
|
|
| ### Step 2: NB2 Training |
|
|
| ```python |
| # In Kaggle notebook, attach: |
| # - Competition data: birdclef-2026 |
| # - NB1 output dataset |
| # Run nb02_training.py β produces 10 .pt files in /kaggle/working/models/ |
| ``` |
|
|
| Save models as a new Kaggle dataset. |
|
|
| ### Step 3: NB3 Pseudo-Labeling |
|
|
| ```python |
| # Attach NB2 model dataset + NB1 data |
| # Run nb03_pseudo_labeling.py β produces pseudo_labels_soft.csv |
| ``` |
|
|
| ### Step 4: NB4 Inference (Submission) |
|
|
| ```python |
| # Attach NB2 model dataset + competition test data |
| # Run nb04_inference.py β produces submission.csv |
| ``` |
|
|
| --- |
|
|
| ## Critical Rules for BirdCLEF |
|
|
| 1. **NEVER threshold predictions** β It destroys AUC ranking. |
| 2. **NEVER apply non-linear calibration** (`p**0.75`, `p/(p+1)`, etc.) β It distorts rank order. |
| 3. **NEVER mixup or label-smooth** β It squashes logits into a narrow range, killing AUC spread. |
| 4. **ALWAYS align submission with sample_submission.csv** β `sample[["row_id"]].merge(sub, ...)` |
| 5. **ALWAYS ensemble diverse models** β Same model, same folds = no gain. |
| 6. **ALWAYS use raw sigmoid outputs** β Let the metric handle calibration. |
|
|
| --- |
|
|
| ## References |
|
|
| - AsymmetricLoss: [arXiv:2009.14119](https://arxiv.org/abs/2009.14119) |
| - Bird-MAE: [arXiv:2504.12880](https://arxiv.org/abs/2504.12880) |
| - sl-BEATs: [arXiv:2508.11845](https://arxiv.org/abs/2508.11845) |
| - Top solution reference: [minalkharat12/birdclef-2026-solution](https://huggingface.co/minalkharat12/birdclef-2026-solution) |
|
|
| --- |
|
|
| ## License |
|
|
| MIT β Competition code for educational purposes. |
|
|
| <!-- ml-intern-provenance --> |
| ## Generated by ML Intern |
|
|
| This model repository was generated by [ML Intern](https://github.com/huggingface/ml-intern), an agent for machine learning research and development on the Hugging Face Hub. |
|
|
| - Try ML Intern: https://smolagents-ml-intern.hf.space |
| - Source code: https://github.com/huggingface/ml-intern |
|
|
| ## Usage |
|
|
| ```python |
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| |
| model_id = "hello9972/birdclef-2026-improved" |
| tokenizer = AutoTokenizer.from_pretrained(model_id) |
| model = AutoModelForCausalLM.from_pretrained(model_id) |
| ``` |
|
|
| For non-causal architectures, replace `AutoModelForCausalLM` with the appropriate `AutoModel` class. |
|
|