File size: 4,400 Bytes
788ee67
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
99af6ea
 
 
788ee67
99af6ea
788ee67
99af6ea
788ee67
99af6ea
788ee67
99af6ea
788ee67
 
 
 
 
 
 
 
 
 
 
99af6ea
788ee67
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
99af6ea
788ee67
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
---
license: apache-2.0
language:
  - en
library_name: transformers
pipeline_tag: image-text-to-text
base_model: Qwen/Qwen2.5-VL-3B-Instruct
tags:
  - sleep-staging
  - polysomnography
  - PSG
  - explainable-AI
  - AASM
  - vision-language-model
  - medical
  - EEG
  - EOG
  - EMG
  - lora
  - rule-grounded
  - multimodal
datasets:
  - Feng613/MASS-EX
model-index:
  - name: SleepVLM-3B
    results:
      - task:
          type: image-text-to-text
          name: Sleep Stage Classification
        dataset:
          name: MASS-SS1
          type: mass-ss1
        metrics:
          - name: Accuracy
            type: accuracy
            value: 0.835
          - name: Macro-F1
            type: macro-f1
            value: 0.793
          - name: Cohen's Kappa
            type: cohens-kappa
            value: 0.767
      - task:
          type: image-text-to-text
          name: Sleep Stage Classification
        dataset:
          name: ZUMS
          type: zums
        metrics:
          - name: Accuracy
            type: accuracy
            value: 0.812
          - name: Macro-F1
            type: macro-f1
            value: 0.766
          - name: Cohen's Kappa
            type: cohens-kappa
            value: 0.743
---

<div align="center">
<img src="SleepVLM_logo.png" alt="SleepVLM Logo" width="360">

# SleepVLM-3B

### Explainable and Rule-Grounded Sleep Staging via a Vision-Language Model

[Paper (coming soon)]() |
[GitHub](https://github.com/Deng-GuiFeng/SleepVLM) |
[MASS-EX Dataset](https://huggingface.co/datasets/Feng613/MASS-EX) |
[Quantized Version (W4A16)](https://huggingface.co/Feng613/SleepVLM-3B-W4A16) |
[Collection](https://huggingface.co/collections/Feng613/sleepvlm)

</div>

---

> **Associated Paper:**
> Guifeng Deng, Pan Wang, Jiquan Wang, Tao Li, Haiteng Jiang. "SleepVLM: Explainable and Rule-Grounded Sleep Staging via a Vision-Language Model." *In preparation.*
> This repository will be made public upon release of the preprint.

## Overview

**SleepVLM-3B** is a rule-grounded vision-language model for explainable automated sleep staging from polysomnography (PSG) recordings. Unlike conventional black-box classifiers that output only a stage label, SleepVLM generates clinician-readable natural-language rationales citing specific AASM scoring rules for every 30-second epoch, making each staging decision auditable against the clinical standard.

The model takes rendered multi-channel PSG waveform images as input (three consecutive 30-second epochs) and produces a predicted sleep stage (W/N1/N2/N3/R), applicable AASM rule identifiers, and a structured natural-language rationale.

SleepVLM-3B is fine-tuned from [Qwen2.5-VL-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct) through a two-phase training pipeline: waveform-perceptual pre-training (WPT) followed by rule-grounded supervised fine-tuning (SFT) using expert annotations from [MASS-EX](https://huggingface.co/datasets/Feng613/MASS-EX).

## Model Details

| Property | Value |
|----------|-------|
| Base model | [Qwen2.5-VL-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct) |
| Parameters | ~3.1B |
| Model size | 7.1 GB (BF16) |
| Fine-tuning method | LoRA (r=16, alpha=32, dropout=0.05) |
| Training hardware | 8x NVIDIA A100 80GB |
| Precision | bfloat16 |
| Input | Three consecutive 30-s PSG epoch images (448 x 224 px) |
| PSG channels | F4-M1, C4-M1, O2-M1, LOC, ROC, Chin EMG |

## Intended Use

- **Primary use:** Research on explainable automated sleep staging from PSG recordings.
- **Intended users:** Sleep medicine researchers, clinical informatics researchers, and AI/ML researchers working on interpretable medical AI.
- **Clinical note:** This model is intended for research purposes. It has not been validated for clinical diagnostic use and should not replace professional sleep technologist scoring in clinical settings.

## Citation

If you use SleepVLM in your research, please cite:

```bibtex
@article{deng2026sleepvlm,
  author    = {Deng, Guifeng and Wang, Pan and Wang, Jiquan and Li, Tao and Jiang, Haiteng},
  title     = {{SleepVLM}: Explainable and Rule-Grounded Sleep Staging
               via a Vision-Language Model},
  journal   = {}, % TODO: update after publication
  year      = {2026}
}
```

## License

This model is released under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).