---
license: apache-2.0
language:
- en
library_name: transformers
pipeline_tag: image-text-to-text
base_model: Qwen/Qwen2.5-VL-3B-Instruct
tags:
- sleep-staging
- polysomnography
- PSG
- explainable-AI
- AASM
- vision-language-model
- medical
- EEG
- EOG
- EMG
- lora
- rule-grounded
- multimodal
datasets:
- Feng613/MASS-EX
model-index:
- name: SleepVLM-3B
results:
- task:
type: image-text-to-text
name: Sleep Stage Classification
dataset:
name: MASS-SS1
type: mass-ss1
metrics:
- name: Accuracy
type: accuracy
value: 0.835
- name: Macro-F1
type: macro-f1
value: 0.793
- name: Cohen's Kappa
type: cohens-kappa
value: 0.767
- task:
type: image-text-to-text
name: Sleep Stage Classification
dataset:
name: ZUMS
type: zums
metrics:
- name: Accuracy
type: accuracy
value: 0.812
- name: Macro-F1
type: macro-f1
value: 0.766
- name: Cohen's Kappa
type: cohens-kappa
value: 0.743
---

# SleepVLM-3B
### Explainable and Rule-Grounded Sleep Staging via a Vision-Language Model
[Paper (coming soon)]() |
[GitHub](https://github.com/Deng-GuiFeng/SleepVLM) |
[MASS-EX Dataset](https://huggingface.co/datasets/Feng613/MASS-EX) |
[Quantized Version (W4A16)](https://huggingface.co/Feng613/SleepVLM-3B-W4A16) |
[Collection](https://huggingface.co/collections/Feng613/sleepvlm)
---
> **Associated Paper:**
> Guifeng Deng, Pan Wang, Jiquan Wang, Tao Li, Haiteng Jiang. "SleepVLM: Explainable and Rule-Grounded Sleep Staging via a Vision-Language Model." *In preparation.*
> This repository will be made public upon release of the preprint.
## Overview
**SleepVLM-3B** is a rule-grounded vision-language model for explainable automated sleep staging from polysomnography (PSG) recordings. Unlike conventional black-box classifiers that output only a stage label, SleepVLM generates clinician-readable natural-language rationales citing specific AASM scoring rules for every 30-second epoch, making each staging decision auditable against the clinical standard.
The model takes rendered multi-channel PSG waveform images as input (three consecutive 30-second epochs) and produces a predicted sleep stage (W/N1/N2/N3/R), applicable AASM rule identifiers, and a structured natural-language rationale.
SleepVLM-3B is fine-tuned from [Qwen2.5-VL-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct) through a two-phase training pipeline: waveform-perceptual pre-training (WPT) followed by rule-grounded supervised fine-tuning (SFT) using expert annotations from [MASS-EX](https://huggingface.co/datasets/Feng613/MASS-EX).
## Model Details
| Property | Value |
|----------|-------|
| Base model | [Qwen2.5-VL-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct) |
| Parameters | ~3.1B |
| Model size | 7.1 GB (BF16) |
| Fine-tuning method | LoRA (r=16, alpha=32, dropout=0.05) |
| Training hardware | 8x NVIDIA A100 80GB |
| Precision | bfloat16 |
| Input | Three consecutive 30-s PSG epoch images (448 x 224 px) |
| PSG channels | F4-M1, C4-M1, O2-M1, LOC, ROC, Chin EMG |
## Intended Use
- **Primary use:** Research on explainable automated sleep staging from PSG recordings.
- **Intended users:** Sleep medicine researchers, clinical informatics researchers, and AI/ML researchers working on interpretable medical AI.
- **Clinical note:** This model is intended for research purposes. It has not been validated for clinical diagnostic use and should not replace professional sleep technologist scoring in clinical settings.
## Citation
If you use SleepVLM in your research, please cite:
```bibtex
@article{deng2026sleepvlm,
author = {Deng, Guifeng and Wang, Pan and Wang, Jiquan and Li, Tao and Jiang, Haiteng},
title = {{SleepVLM}: Explainable and Rule-Grounded Sleep Staging
via a Vision-Language Model},
journal = {}, % TODO: update after publication
year = {2026}
}
```
## License
This model is released under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).