--- license: apache-2.0 language: - en library_name: transformers pipeline_tag: image-text-to-text base_model: Qwen/Qwen2.5-VL-3B-Instruct tags: - sleep-staging - polysomnography - PSG - explainable-AI - AASM - vision-language-model - medical - EEG - EOG - EMG - lora - rule-grounded - multimodal datasets: - Feng613/MASS-EX model-index: - name: SleepVLM-3B results: - task: type: image-text-to-text name: Sleep Stage Classification dataset: name: MASS-SS1 type: mass-ss1 metrics: - name: Accuracy type: accuracy value: 0.835 - name: Macro-F1 type: macro-f1 value: 0.793 - name: Cohen's Kappa type: cohens-kappa value: 0.767 - task: type: image-text-to-text name: Sleep Stage Classification dataset: name: ZUMS type: zums metrics: - name: Accuracy type: accuracy value: 0.812 - name: Macro-F1 type: macro-f1 value: 0.766 - name: Cohen's Kappa type: cohens-kappa value: 0.743 ---
SleepVLM Logo # SleepVLM-3B ### Explainable and Rule-Grounded Sleep Staging via a Vision-Language Model [Paper (coming soon)]() | [GitHub](https://github.com/Deng-GuiFeng/SleepVLM) | [MASS-EX Dataset](https://huggingface.co/datasets/Feng613/MASS-EX) | [Quantized Version (W4A16)](https://huggingface.co/Feng613/SleepVLM-3B-W4A16) | [Collection](https://huggingface.co/collections/Feng613/sleepvlm)
--- > **Associated Paper:** > Guifeng Deng, Pan Wang, Jiquan Wang, Tao Li, Haiteng Jiang. "SleepVLM: Explainable and Rule-Grounded Sleep Staging via a Vision-Language Model." *In preparation.* > This repository will be made public upon release of the preprint. ## Overview **SleepVLM-3B** is a rule-grounded vision-language model for explainable automated sleep staging from polysomnography (PSG) recordings. Unlike conventional black-box classifiers that output only a stage label, SleepVLM generates clinician-readable natural-language rationales citing specific AASM scoring rules for every 30-second epoch, making each staging decision auditable against the clinical standard. The model takes rendered multi-channel PSG waveform images as input (three consecutive 30-second epochs) and produces a predicted sleep stage (W/N1/N2/N3/R), applicable AASM rule identifiers, and a structured natural-language rationale. SleepVLM-3B is fine-tuned from [Qwen2.5-VL-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct) through a two-phase training pipeline: waveform-perceptual pre-training (WPT) followed by rule-grounded supervised fine-tuning (SFT) using expert annotations from [MASS-EX](https://huggingface.co/datasets/Feng613/MASS-EX). ## Model Details | Property | Value | |----------|-------| | Base model | [Qwen2.5-VL-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct) | | Parameters | ~3.1B | | Model size | 7.1 GB (BF16) | | Fine-tuning method | LoRA (r=16, alpha=32, dropout=0.05) | | Training hardware | 8x NVIDIA A100 80GB | | Precision | bfloat16 | | Input | Three consecutive 30-s PSG epoch images (448 x 224 px) | | PSG channels | F4-M1, C4-M1, O2-M1, LOC, ROC, Chin EMG | ## Intended Use - **Primary use:** Research on explainable automated sleep staging from PSG recordings. - **Intended users:** Sleep medicine researchers, clinical informatics researchers, and AI/ML researchers working on interpretable medical AI. - **Clinical note:** This model is intended for research purposes. It has not been validated for clinical diagnostic use and should not replace professional sleep technologist scoring in clinical settings. ## Citation If you use SleepVLM in your research, please cite: ```bibtex @article{deng2026sleepvlm, author = {Deng, Guifeng and Wang, Pan and Wang, Jiquan and Li, Tao and Jiang, Haiteng}, title = {{SleepVLM}: Explainable and Rule-Grounded Sleep Staging via a Vision-Language Model}, journal = {}, % TODO: update after publication year = {2026} } ``` ## License This model is released under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).