--- tags: - ml-intern --- # BirdCLEF+ 2026 — Improved Pipeline (Target: 0.90+) This repository contains an improved 4-notebook pipeline for BirdCLEF+ 2026, based on lessons learned from a 0.815 score baseline. 🔗 **Competition**: https://www.kaggle.com/competitions/birdclef-2026 🔗 **Author Repo**: https://huggingface.co/hello9972/birdclef-2026-improved --- ## Why You Stuck at 0.815 Your original pipeline had these fatal problems that prevented reaching 0.90+: ### ❌ What Destroyed Your Score | Mistake | Impact | Why | |---------|--------|-----| | **Threshold boosting** (`p * 0.85 + mask * 0.15`) | 0.815 → **0.52** | Thresholds destroy probability ranking. BirdCLEF metric is AUC-based (rank-sensitive). | | **Mixup + Label Smoothing** | Softened outputs to 0.05-0.2 range | Destroyed calibration needed for AUC. AUC needs spread, not softening. | | **Aggressive calibration** (`p ** 0.75`) | 0.815 → **0.53** | Non-linear transforms distort ranking order. | | **2-model ensemble only** | Ceiling ~0.82 | Top solutions use 5-20 models. | | **No 5-fold CV** | Could not ensemble diverse models | Same data, same predictions = no ensemble gain. | | **No pseudo-labeling** | Missing 5-8% boost from test-domain adaptation | Top solutions use noisy student on test predictions. | ### ✅ What Actually Works for BirdCLEF - **Raw sigmoid outputs** — NO thresholds, NO calibration - **Simple ensemble** — mean logits, not probabilities - **Exact sample submission alignment** — `sample[["row_id"]].merge(sub, ...)` - **Pure PyTorch inference** — No ONNX in Kaggle submissions - **Minimal post-processing** — tiny clip only --- ## New Architecture Overview ``` NB1 → Data Prep + StratifiedKFold(5) NB2 → 5-Fold Training (AsymmetricLoss, SpecAugment, Energy Crop) NB3 → Pseudo-Labeling (Noisy Student on train_soundscapes) NB4 → Inference (10-model ensemble, TTA, rank averaging) ``` --- ## Key Improvements ### 1. Loss Function: AsymmetricLoss (NOT BCE) Replaces `BCEWithLogitsLoss` with AsymmetricLoss from [arXiv:2009.14119](https://arxiv.org/abs/2009.14119): ```python class AsymmetricLoss(nn.Module): def __init__(self, gamma_neg=4, gamma_pos=0, clip=0.05): ... ``` **Why**: Down-weights easy negatives (background noise, empty segments) while preserving signal for rare species. Does NOT squash logits like label smoothing. ### 2. Energy-Based Window Selection (Perch 2.0 Trick) For training clips longer than 5 seconds, finds the window with highest audio energy instead of random crop: ```python def _energy_crop(self, wav): energy = librosa.feature.rms(y=wav, frame_length=2048, hop_length=512)[0] peak_frame = np.argmax(smoothed_energy) # center window around peak with jitter ``` **Why**: Bird calls are often brief. Random crops miss them. Energy-based crops hit them 3-4x more often. ### 3. Augmentations (Waveform + Spectrogram) | Augmentation | Level | Purpose | |---------------|-------|---------| | Cyclic roll | 100% | Time-shift invariance | | Colored noise | 30% | SNR 3-30dB, f^-decay | Domain adaptation to soundscapes | | Background noise | 50% | Real soundscape mixing | Simulates multi-species recordings | | Gain | 30% | ±12dB | Loudness invariance | | SpecAugment (freq mask) | 50% | 24 bins | Frequency invariance | | SpecAugment (time mask) | 50% | 40 frames | Time invariance | **NO mixup. NO label smoothing.** Both destroyed your score. ### 4. 5-Fold StratifiedKFold ```python skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42) ``` Each fold gets the same species distribution. 5 diverse models = 5x ensemble power. ### 5. Layer-Wise LR Decay ```python lr_scale = layer_decay ** (num_blocks - layer_idx) ``` Deeper layers (closer to input) get smaller LR. Prevents overfitting on early layers. ### 6. Test-Time Augmentation (TTA) 4 variants per chunk: - Original - Time-reversed - +3dB gain - -3dB gain Average logits across all variants. ### 7. Pseudo-Labeling (Noisy Student) Use confident predictions (>0.5) on `train_soundscapes` as additional training data. Retrain with these pseudo-labels + original data. **Expected boost**: 0.84 → 0.88 --- ## Expected Score Improvements | Stage | Technique | Expected Score | |-------|-----------|----------------| | Baseline | Your 0.815 pipeline | **0.815** | | NB2 improvement | AsymmetricLoss + energy crop + no mixup | **0.83-0.85** | | 5-fold ensemble | 10 models (5 folds × 2 backbones) | **0.85-0.87** | | TTA | 4 variants per chunk | **0.86-0.88** | | Pseudo-labeling | Noisy student on soundscapes | **0.88-0.91** | | + Better backbone | Bird-MAE or ConvNeXt | **0.90-0.93** | --- ## Files | File | Purpose | |------|---------| | `nb01_data_prep.py` | Data cleaning, VAD, StratifiedKFold(5) | | `nb02_training.py` | 5-fold training with AsymmetricLoss, SpecAugment | | `nb03_pseudo_labeling.py` | Generate pseudo-labels, noisy student | | `nb04_inference.py` | 10-model ensemble, TTA, submission generation | --- ## How to Run on Kaggle ### Step 1: Create Dataset from NB1 Output After running `nb01_data_prep.py`, create a Kaggle dataset from `/kaggle/working/`: ``` train_cleaned_stratified.csv soundscape_labels_with_folds.csv species_list.csv rare_species.csv ``` ### Step 2: NB2 Training ```python # In Kaggle notebook, attach: # - Competition data: birdclef-2026 # - NB1 output dataset # Run nb02_training.py → produces 10 .pt files in /kaggle/working/models/ ``` Save models as a new Kaggle dataset. ### Step 3: NB3 Pseudo-Labeling ```python # Attach NB2 model dataset + NB1 data # Run nb03_pseudo_labeling.py → produces pseudo_labels_soft.csv ``` ### Step 4: NB4 Inference (Submission) ```python # Attach NB2 model dataset + competition test data # Run nb04_inference.py → produces submission.csv ``` --- ## Critical Rules for BirdCLEF 1. **NEVER threshold predictions** — It destroys AUC ranking. 2. **NEVER apply non-linear calibration** (`p**0.75`, `p/(p+1)`, etc.) — It distorts rank order. 3. **NEVER mixup or label-smooth** — It squashes logits into a narrow range, killing AUC spread. 4. **ALWAYS align submission with sample_submission.csv** — `sample[["row_id"]].merge(sub, ...)` 5. **ALWAYS ensemble diverse models** — Same model, same folds = no gain. 6. **ALWAYS use raw sigmoid outputs** — Let the metric handle calibration. --- ## References - AsymmetricLoss: [arXiv:2009.14119](https://arxiv.org/abs/2009.14119) - Bird-MAE: [arXiv:2504.12880](https://arxiv.org/abs/2504.12880) - sl-BEATs: [arXiv:2508.11845](https://arxiv.org/abs/2508.11845) - Top solution reference: [minalkharat12/birdclef-2026-solution](https://huggingface.co/minalkharat12/birdclef-2026-solution) --- ## License MIT — Competition code for educational purposes. ## Generated by ML Intern This model repository was generated by [ML Intern](https://github.com/huggingface/ml-intern), an agent for machine learning research and development on the Hugging Face Hub. - Try ML Intern: https://smolagents-ml-intern.hf.space - Source code: https://github.com/huggingface/ml-intern ## Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_id = "hello9972/birdclef-2026-improved" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained(model_id) ``` For non-causal architectures, replace `AutoModelForCausalLM` with the appropriate `AutoModel` class.