| --- |
| license: apache-2.0 |
| language: |
| - en |
| library_name: transformers |
| pipeline_tag: image-text-to-text |
| base_model: Qwen/Qwen2.5-VL-3B-Instruct |
| tags: |
| - sleep-staging |
| - polysomnography |
| - PSG |
| - explainable-AI |
| - AASM |
| - vision-language-model |
| - medical |
| - EEG |
| - EOG |
| - EMG |
| - lora |
| - rule-grounded |
| - multimodal |
| datasets: |
| - Feng613/MASS-EX |
| model-index: |
| - name: SleepVLM-3B |
| results: |
| - task: |
| type: image-text-to-text |
| name: Sleep Stage Classification |
| dataset: |
| name: MASS-SS1 |
| type: mass-ss1 |
| metrics: |
| - name: Accuracy |
| type: accuracy |
| value: 0.835 |
| - name: Macro-F1 |
| type: macro-f1 |
| value: 0.793 |
| - name: Cohen's Kappa |
| type: cohens-kappa |
| value: 0.767 |
| - task: |
| type: image-text-to-text |
| name: Sleep Stage Classification |
| dataset: |
| name: ZUMS |
| type: zums |
| metrics: |
| - name: Accuracy |
| type: accuracy |
| value: 0.812 |
| - name: Macro-F1 |
| type: macro-f1 |
| value: 0.766 |
| - name: Cohen's Kappa |
| type: cohens-kappa |
| value: 0.743 |
| --- |
| |
| <div align="center"> |
| <img src="SleepVLM_logo.png" alt="SleepVLM Logo" width="360"> |
|
|
| # SleepVLM-3B |
|
|
| ### Explainable and Rule-Grounded Sleep Staging via a Vision-Language Model |
|
|
| [Paper (coming soon)]() | |
| [GitHub](https://github.com/Deng-GuiFeng/SleepVLM) | |
| [MASS-EX Dataset](https://huggingface.co/datasets/Feng613/MASS-EX) | |
| [Quantized Version (W4A16)](https://huggingface.co/Feng613/SleepVLM-3B-W4A16) | |
| [Collection](https://huggingface.co/collections/Feng613/sleepvlm) |
|
|
| </div> |
|
|
| --- |
|
|
| > **Associated Paper:** |
| > Guifeng Deng, Pan Wang, Jiquan Wang, Tao Li, Haiteng Jiang. "SleepVLM: Explainable and Rule-Grounded Sleep Staging via a Vision-Language Model." *In preparation.* |
| > This repository will be made public upon release of the preprint. |
|
|
| ## Overview |
|
|
| **SleepVLM-3B** is a rule-grounded vision-language model for explainable automated sleep staging from polysomnography (PSG) recordings. Unlike conventional black-box classifiers that output only a stage label, SleepVLM generates clinician-readable natural-language rationales citing specific AASM scoring rules for every 30-second epoch, making each staging decision auditable against the clinical standard. |
|
|
| The model takes rendered multi-channel PSG waveform images as input (three consecutive 30-second epochs) and produces a predicted sleep stage (W/N1/N2/N3/R), applicable AASM rule identifiers, and a structured natural-language rationale. |
|
|
| SleepVLM-3B is fine-tuned from [Qwen2.5-VL-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct) through a two-phase training pipeline: waveform-perceptual pre-training (WPT) followed by rule-grounded supervised fine-tuning (SFT) using expert annotations from [MASS-EX](https://huggingface.co/datasets/Feng613/MASS-EX). |
|
|
| ## Model Details |
|
|
| | Property | Value | |
| |----------|-------| |
| | Base model | [Qwen2.5-VL-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct) | |
| | Parameters | ~3.1B | |
| | Model size | 7.1 GB (BF16) | |
| | Fine-tuning method | LoRA (r=16, alpha=32, dropout=0.05) | |
| | Training hardware | 8x NVIDIA A100 80GB | |
| | Precision | bfloat16 | |
| | Input | Three consecutive 30-s PSG epoch images (448 x 224 px) | |
| | PSG channels | F4-M1, C4-M1, O2-M1, LOC, ROC, Chin EMG | |
|
|
| ## Intended Use |
|
|
| - **Primary use:** Research on explainable automated sleep staging from PSG recordings. |
| - **Intended users:** Sleep medicine researchers, clinical informatics researchers, and AI/ML researchers working on interpretable medical AI. |
| - **Clinical note:** This model is intended for research purposes. It has not been validated for clinical diagnostic use and should not replace professional sleep technologist scoring in clinical settings. |
|
|
| ## Citation |
|
|
| If you use SleepVLM in your research, please cite: |
|
|
| ```bibtex |
| @article{deng2026sleepvlm, |
| author = {Deng, Guifeng and Wang, Pan and Wang, Jiquan and Li, Tao and Jiang, Haiteng}, |
| title = {{SleepVLM}: Explainable and Rule-Grounded Sleep Staging |
| via a Vision-Language Model}, |
| journal = {}, % TODO: update after publication |
| year = {2026} |
| } |
| ``` |
|
|
| ## License |
|
|
| This model is released under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0). |
|
|