ml-intern

BirdCLEF+ 2026 β€” Improved Pipeline (Target: 0.90+)

This repository contains an improved 4-notebook pipeline for BirdCLEF+ 2026, based on lessons learned from a 0.815 score baseline.

πŸ”— Competition: https://www.kaggle.com/competitions/birdclef-2026
πŸ”— Author Repo: https://huggingface.co/hello9972/birdclef-2026-improved


Why You Stuck at 0.815

Your original pipeline had these fatal problems that prevented reaching 0.90+:

❌ What Destroyed Your Score

Mistake Impact Why
Threshold boosting (p * 0.85 + mask * 0.15) 0.815 β†’ 0.52 Thresholds destroy probability ranking. BirdCLEF metric is AUC-based (rank-sensitive).
Mixup + Label Smoothing Softened outputs to 0.05-0.2 range Destroyed calibration needed for AUC. AUC needs spread, not softening.
Aggressive calibration (p ** 0.75) 0.815 β†’ 0.53 Non-linear transforms distort ranking order.
2-model ensemble only Ceiling ~0.82 Top solutions use 5-20 models.
No 5-fold CV Could not ensemble diverse models Same data, same predictions = no ensemble gain.
No pseudo-labeling Missing 5-8% boost from test-domain adaptation Top solutions use noisy student on test predictions.

βœ… What Actually Works for BirdCLEF

  • Raw sigmoid outputs β€” NO thresholds, NO calibration
  • Simple ensemble β€” mean logits, not probabilities
  • Exact sample submission alignment β€” sample[["row_id"]].merge(sub, ...)
  • Pure PyTorch inference β€” No ONNX in Kaggle submissions
  • Minimal post-processing β€” tiny clip only

New Architecture Overview

NB1 β†’ Data Prep + StratifiedKFold(5)
NB2 β†’ 5-Fold Training (AsymmetricLoss, SpecAugment, Energy Crop)
NB3 β†’ Pseudo-Labeling (Noisy Student on train_soundscapes)
NB4 β†’ Inference (10-model ensemble, TTA, rank averaging)

Key Improvements

1. Loss Function: AsymmetricLoss (NOT BCE)

Replaces BCEWithLogitsLoss with AsymmetricLoss from arXiv:2009.14119:

class AsymmetricLoss(nn.Module):
    def __init__(self, gamma_neg=4, gamma_pos=0, clip=0.05):
        ...

Why: Down-weights easy negatives (background noise, empty segments) while preserving signal for rare species. Does NOT squash logits like label smoothing.

2. Energy-Based Window Selection (Perch 2.0 Trick)

For training clips longer than 5 seconds, finds the window with highest audio energy instead of random crop:

def _energy_crop(self, wav):
    energy = librosa.feature.rms(y=wav, frame_length=2048, hop_length=512)[0]
    peak_frame = np.argmax(smoothed_energy)
    # center window around peak with jitter

Why: Bird calls are often brief. Random crops miss them. Energy-based crops hit them 3-4x more often.

3. Augmentations (Waveform + Spectrogram)

Augmentation Level Purpose
Cyclic roll 100% Time-shift invariance
Colored noise 30% SNR 3-30dB, f^-decay
Background noise 50% Real soundscape mixing
Gain 30% Β±12dB
SpecAugment (freq mask) 50% 24 bins
SpecAugment (time mask) 50% 40 frames

NO mixup. NO label smoothing. Both destroyed your score.

4. 5-Fold StratifiedKFold

skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

Each fold gets the same species distribution. 5 diverse models = 5x ensemble power.

5. Layer-Wise LR Decay

lr_scale = layer_decay ** (num_blocks - layer_idx)

Deeper layers (closer to input) get smaller LR. Prevents overfitting on early layers.

6. Test-Time Augmentation (TTA)

4 variants per chunk:

  • Original
  • Time-reversed
  • +3dB gain
  • -3dB gain

Average logits across all variants.

7. Pseudo-Labeling (Noisy Student)

Use confident predictions (>0.5) on train_soundscapes as additional training data. Retrain with these pseudo-labels + original data.

Expected boost: 0.84 β†’ 0.88


Expected Score Improvements

Stage Technique Expected Score
Baseline Your 0.815 pipeline 0.815
NB2 improvement AsymmetricLoss + energy crop + no mixup 0.83-0.85
5-fold ensemble 10 models (5 folds Γ— 2 backbones) 0.85-0.87
TTA 4 variants per chunk 0.86-0.88
Pseudo-labeling Noisy student on soundscapes 0.88-0.91
+ Better backbone Bird-MAE or ConvNeXt 0.90-0.93

Files

File Purpose
nb01_data_prep.py Data cleaning, VAD, StratifiedKFold(5)
nb02_training.py 5-fold training with AsymmetricLoss, SpecAugment
nb03_pseudo_labeling.py Generate pseudo-labels, noisy student
nb04_inference.py 10-model ensemble, TTA, submission generation

How to Run on Kaggle

Step 1: Create Dataset from NB1 Output

After running nb01_data_prep.py, create a Kaggle dataset from /kaggle/working/:

train_cleaned_stratified.csv
soundscape_labels_with_folds.csv
species_list.csv
rare_species.csv

Step 2: NB2 Training

# In Kaggle notebook, attach:
# - Competition data: birdclef-2026
# - NB1 output dataset
# Run nb02_training.py β†’ produces 10 .pt files in /kaggle/working/models/

Save models as a new Kaggle dataset.

Step 3: NB3 Pseudo-Labeling

# Attach NB2 model dataset + NB1 data
# Run nb03_pseudo_labeling.py β†’ produces pseudo_labels_soft.csv

Step 4: NB4 Inference (Submission)

# Attach NB2 model dataset + competition test data
# Run nb04_inference.py β†’ produces submission.csv

Critical Rules for BirdCLEF

  1. NEVER threshold predictions β€” It destroys AUC ranking.
  2. NEVER apply non-linear calibration (p**0.75, p/(p+1), etc.) β€” It distorts rank order.
  3. NEVER mixup or label-smooth β€” It squashes logits into a narrow range, killing AUC spread.
  4. ALWAYS align submission with sample_submission.csv β€” sample[["row_id"]].merge(sub, ...)
  5. ALWAYS ensemble diverse models β€” Same model, same folds = no gain.
  6. ALWAYS use raw sigmoid outputs β€” Let the metric handle calibration.

References


License

MIT β€” Competition code for educational purposes.

Generated by ML Intern

This model repository was generated by ML Intern, an agent for machine learning research and development on the Hugging Face Hub.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "hello9972/birdclef-2026-improved"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

For non-causal architectures, replace AutoModelForCausalLM with the appropriate AutoModel class.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Papers for hello9972/birdclef-2026-improved