| | --- |
| | license: mit |
| | tags: |
| | - sleep |
| | - eeg |
| | - polysomnography |
| | - foundation-model |
| | - self-supervised |
| | - vit |
| | - biosignals |
| | pipeline_tag: feature-extraction |
| | library_name: pytorch |
| | language: |
| | - en |
| | --- |
| | |
| | # OSF: On Pre-training and Scaling of Sleep Foundation Models |
| |
|
| | [](#citation) |
| | [](https://yang-ai-lab.github.io/osf/) |
| | [](LICENSE) |
| | [](#installation) |
| |
|
| |
|
| | ## π₯ News |
| |
|
| | - [2026-2-24] Our codebase and checkpoint is released. |
| | - [2026-2-22] Our paper is out. |
| |
|
| | ## π Introduction |
| |
|
| | Polysomnography (PSG) provides the gold standard for sleep assessment but suffers from substantial heterogeneity across recording devices and cohorts. |
| | There have been growing efforts to build general-purpose foundation models (FMs) for sleep physiology, but lack an in-depth understanding of the pre-training process and scaling patterns that lead to more generalizable sleep FMs. |
| | To fill this gap, we curate a massive corpus of 166,500 hours of sleep recordings from nine public sources and establish SleepBench, a comprehensive, fully open-source benchmark. |
| | Leveraging SleepBench, we systematically evaluate four families of self-supervised pre-training objectives and uncover three critical findings: |
| | (1) existing FMs fail to generalize to missing channels at inference; |
| | (2) channel-invariant feature learning is essential for pre-training; |
| | and (3) scaling sample size, model capacity, and multi-source data mixture consistently improves downstream performance. |
| | With an enhanced pre-training and scaling recipe, we introduce OSF, a family of sleep FMs that achieves state-of-the-art performance across nine datasets on diverse sleep and disease prediction tasks. |
| | Further analysis of OSF also reveals intriguing properties in sample efficiency, hierarchical aggregation, and cross-dataset scaling. |
| |
|
| |
|
| | ## π Table of Contents |
| |
|
| | 1. [Installation](#-installation) |
| | 2. [Quick Start](#-quick-start) |
| | 3. [Pretrained Weights](#-pretrained-weights) |
| | 4. [Usage](#-usage) |
| | 5. [Benchmark Evaluations](#-benchmark-evaluations) |
| | 6. [Supported Datasets](#-supported-datasets) |
| | 7. [Citation](#-citation) |
| |
|
| | ## πΏ Installation |
| |
|
| | ```bash |
| | git clone https://huggingface.co/yang-ai-lab/OSF-Base |
| | cd OSF-Base |
| | conda env create -f environment.yml |
| | conda activate myenv |
| | ``` |
| |
|
| |
|
| | ### Dependencies |
| |
|
| | - Python >= 3.10 |
| | - PyTorch >= 2.9.0 |
| | - PyTorch Lightning >= 2.5.5 |
| |
|
| |
|
| | ## π Quick Start |
| |
|
| | We provide a demo notebook (`demo.ipynb`) demonstrating how to extract embeddings from PSG signals using the pretrained model. |
| |
|
| | ```python |
| | import torch |
| | from osf.backbone.vit1d_cls import vit_base |
| | |
| | # Load pretrained weights (included in this repo) |
| | payload = torch.load("osf_backbone.pth", map_location="cpu", weights_only=False) |
| | meta = payload["metadata"] |
| | |
| | # Initialize model |
| | backbone = vit_base( |
| | num_leads=meta["num_leads"], # 12 channels |
| | seq_len=meta["seq_len"], # 1920 (64 Hz Γ 30 s) |
| | patch_size=meta["patch_size_time"], |
| | lead_wise=meta["lead_wise"], |
| | patch_size_ch=meta["patch_size_ch"], |
| | ) |
| | backbone.load_state_dict(payload["state_dict"]) |
| | backbone.eval() |
| | |
| | # Extract embeddings |
| | # x: [B, 12, 1920] - 12-channel PSG, 64 Hz Γ 30 seconds |
| | with torch.no_grad(): |
| | cls_embs, patch_embs = backbone.forward_encoding(x, return_sequence=False) |
| | # cls_embs: [B, 768] - Global epoch-level representation |
| | # patch_embs: [B, 90, 768] - Local patch representations |
| | ``` |
| |
|
| | ## π¦ Pretrained Weights |
| |
|
| | | Model | Backbone | Channels | |
| | |-------|----------|----------| |
| | | OSF | ViT-Base | 12-ch | |
| |
|
| | The pretrained weights are included in this repository. You can download them via the Hugging Face Hub: |
| |
|
| | ```python |
| | from huggingface_hub import hf_hub_download |
| | checkpoint_path = hf_hub_download(repo_id="yang-ai-lab/OSF-Base", filename="osf_backbone.pth") |
| | ``` |
| |
|
| | Or via the CLI: |
| |
|
| | ```bash |
| | huggingface-cli download yang-ai-lab/OSF-Base osf_backbone.pth |
| | ``` |
| |
|
| | ## π©βπ» Usage |
| |
|
| | ### Input Format |
| |
|
| | Expected input format: |
| | - **12 PSG Channels**: ECG, EMG_Chin, EMG_LLeg, EMG_RLeg, ABD, THX, NP, SN, EOG_E1_A2, EOG_E2_A1, EEG_C3_A2, EEG_C4_A1 |
| | - **Sample Rate**: 64 Hz |
| | - **Epoch Length**: 30 seconds |
| | - **Input Shape**: `[B, 12, 1920]` |
| | |
| | ### Pretraining |
| | |
| | We support multiple self-supervised pretraining methods, for example, to launch pre-training of our OSF method, run pretraining: |
| | |
| | ```bash |
| | python main_pretrain.py \ |
| | --model_name "dino_ours" \ |
| | --psg_encoder_name "vit_base" \ |
| | --batch_size 256 \ |
| | --lr 5e-5 \ |
| | --max_epochs 30 \ |
| | --num_devices 4 \ |
| | --patch_size_time 64 \ |
| | --patch_size_ch 4 \ |
| | --precision "bf16-mixed" |
| | ``` |
| | |
| | See `main_pipleines/main_pretrain.py` for more detailed settings. |
| |
|
| | ### Fine-tuning |
| |
|
| | Fine-tune the pretrained model on downstream tasks: |
| |
|
| | ```bash |
| | python main_finetune.py \ |
| | --model_name "dino_ours" \ |
| | --ckpt_path "/path/to/pretrained/checkpoint.ckpt" \ |
| | --downstream_dataset_name "shhs" \ |
| | --eval_label "Stage" \ |
| | --train_data_pct 1.0 \ |
| | --max_steps 500 \ |
| | --lr 0.1 \ |
| | --num_devices 4 |
| | ``` |
| |
|
| |
|
| | ## π Benchmark Evaluations |
| |
|
| | ### Benchmarked SSL Methods |
| |
|
| | | Method | Type | Original Paper | |
| | |--------|------|-------------| |
| | | SleepFM | Contrastive | Leave-one-out multi-modal contrastive learning | |
| | | SimCLR | Contrastive | Simple Contrastive Learning | |
| | | DINO | Self-distillation | DINOv2 (Oquab et al., 2023) | |
| | | VQ-VAE | Reconstruction | Vector-quantized variational autoencoder | |
| | | MAE | Reconstruction | Masked Autoencoding | |
| | | AR | Autoregressive | Autoregressive Next-Token prediction | |
| | | OSF | Self-distillation | ours | |
| |
|
| | ### Downstream Tasks |
| |
|
| | **Epoch-level Classification Tasks:** |
| |
|
| | | Task | Classes | Description | |
| | |------|---------|-------------| |
| | | Sleep Stage | 4 | Awake, Light Sleep, Deep Sleep, REM classification | |
| | | Arousal | 2 | Arousal event detection | |
| | | Hypopnea | 2 | Hypopnea event detection | |
| | | Oxygen Desaturation | 2 | Oxygen desaturation detection | |
| |
|
| |
|
| | ### Evaluation Settings |
| |
|
| | | Setting | Description | |
| | |---------|-------------| |
| | | Linear Probing | Freeze backbone, train linear classifier | |
| | | Full Fine-tuning | Fine-tune entire model end-to-end | |
| | | Few-shot (k-shot) | Train with limited labeled samples | |
| |
|
| | For example scripts, see `main_pipelines` and `bash_scripts` folders. |
| |
|
| | ## π Supported Datasets |
| |
|
| | We aggregated nine large-scale datasets from the National Sleep Research Resource platform. |
| |
|
| | | Dataset | Full Name | Source | |
| | |---------|-----------|--------| |
| | | SHHS | Sleep Heart Health Study | NSRR | |
| | | CHAT | Childhood Adenotonsillectomy Trial | NSRR | |
| | | MROS | MrOS Sleep Study | NSRR | |
| | | CCSHS | Cleveland Children's Sleep and Health Study | NSRR | |
| | | CFS | Cleveland Family Study | NSRR | |
| | | MESA | Multi-Ethnic Study of Atherosclerosis | NSRR | |
| | | SOF | Study of Osteoporotic Fractures | NSRR | |
| | | WSC | Wisconsin Sleep Cohort | NSRR | |
| | | STAGES | Stanford Technology Analytics and Genomics in Sleep | NSRR | |
| | | NCHSDB | NCH Sleep DataBank | NSRR | |
| |
|
| | For new users, please apply for an account and access to each of these datasets following instructions here [NSRR Registration](https://sleepdata.org/join) |
| |
|
| | ## π Project Structure |
| |
|
| | ``` |
| | OSF-Open-Sleep-Foundation-Model/ |
| | βββ osf/ |
| | β βββ backbone/ # ViT backbone implementations |
| | β β βββ vit1d_cls.py |
| | β βββ models/ # SSL model implementations |
| | β β βββ dino_model_cls.py |
| | β β |
| | β βββ datasets/ # Data loading utilities |
| | β βββ utils/ # Helper functions |
| | βββ main_pipelines/ # Training scripts |
| | β βββ main_pretrain.py |
| | β βββ ... |
| | βββ bash_scripts/ # Example bash scripts |
| | βββ osf_backbone.pth # Pretrained model weights |
| | βββ demo.ipynb # Quick start demo |
| | βββ config.py # Dataset and channel configurations |
| | βββ train_config.py # Training configurations |
| | ``` |
| |
|
| |
|
| | ## π Citation |
| |
|
| | If you use this code or models in your research, please cite our paper: |
| |
|
| | ```bibtex |
| | @article{shuai2026osf, |
| | title={OSF: On Pre-training and Scaling of Sleep Foundation Models}, |
| | author={Shuai, Zitao and Xu, Zongzhe and Yang, David and Wang, Wei and Yang, Yuzhe}, |
| | journal={arXiv preprint arXiv:2603.00190}, |
| | year={2026} |
| | } |
| | ``` |
| |
|
| |
|