README.md · Feng613/SleepVLM-3B at main

SleepVLM-3B / README.md

Feng613

Upload README.md with huggingface_hub

38dda00 verified 14 days ago

preview code

raw

history blame contribute delete

4.4 kB

	---
	license: apache-2.0
	language:
	- en
	library_name: transformers
	pipeline_tag: image-text-to-text
	base_model: Qwen/Qwen2.5-VL-3B-Instruct
	tags:
	- sleep-staging
	- polysomnography
	- PSG
	- explainable-AI
	- AASM
	- vision-language-model
	- medical
	- EEG
	- EOG
	- EMG
	- lora
	- rule-grounded
	- multimodal
	datasets:
	- Feng613/MASS-EX
	model-index:
	- name: SleepVLM-3B
	results:
	- task:
	type: image-text-to-text
	name: Sleep Stage Classification
	dataset:
	name: MASS-SS1
	type: mass-ss1
	metrics:
	- name: Accuracy
	type: accuracy
	value: 0.835
	- name: Macro-F1
	type: macro-f1
	value: 0.793
	- name: Cohen's Kappa
	type: cohens-kappa
	value: 0.767
	- task:
	type: image-text-to-text
	name: Sleep Stage Classification
	dataset:
	name: ZUMS
	type: zums
	metrics:
	- name: Accuracy
	type: accuracy
	value: 0.812
	- name: Macro-F1
	type: macro-f1
	value: 0.766
	- name: Cohen's Kappa
	type: cohens-kappa
	value: 0.743
	---

	<div align="center">
	<img src="SleepVLM_logo.png" alt="SleepVLM Logo" width="360">

	# SleepVLM-3B

	### Explainable and Rule-Grounded Sleep Staging via a Vision-Language Model

	[Paper (coming soon)]() \|
	[GitHub](https://github.com/Deng-GuiFeng/SleepVLM) \|
	[MASS-EX Dataset](https://huggingface.co/datasets/Feng613/MASS-EX) \|
	[Quantized Version (W4A16)](https://huggingface.co/Feng613/SleepVLM-3B-W4A16) \|
	[Collection](https://huggingface.co/collections/Feng613/sleepvlm)

	</div>

	---

	> Associated Paper:
	> Guifeng Deng, Pan Wang, Jiquan Wang, Tao Li, Haiteng Jiang. "SleepVLM: Explainable and Rule-Grounded Sleep Staging via a Vision-Language Model." In preparation.
	> This repository will be made public upon release of the preprint.

	## Overview

	SleepVLM-3B is a rule-grounded vision-language model for explainable automated sleep staging from polysomnography (PSG) recordings. Unlike conventional black-box classifiers that output only a stage label, SleepVLM generates clinician-readable natural-language rationales citing specific AASM scoring rules for every 30-second epoch, making each staging decision auditable against the clinical standard.

	The model takes rendered multi-channel PSG waveform images as input (three consecutive 30-second epochs) and produces a predicted sleep stage (W/N1/N2/N3/R), applicable AASM rule identifiers, and a structured natural-language rationale.

	SleepVLM-3B is fine-tuned from [Qwen2.5-VL-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct) through a two-phase training pipeline: waveform-perceptual pre-training (WPT) followed by rule-grounded supervised fine-tuning (SFT) using expert annotations from [MASS-EX](https://huggingface.co/datasets/Feng613/MASS-EX).

	## Model Details

	\| Property \| Value \|
	\|----------\|-------\|
	\| Base model \| [Qwen2.5-VL-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct) \|
	\| Parameters \| ~3.1B \|
	\| Model size \| 7.1 GB (BF16) \|
	\| Fine-tuning method \| LoRA (r=16, alpha=32, dropout=0.05) \|
	\| Training hardware \| 8x NVIDIA A100 80GB \|
	\| Precision \| bfloat16 \|
	\| Input \| Three consecutive 30-s PSG epoch images (448 x 224 px) \|
	\| PSG channels \| F4-M1, C4-M1, O2-M1, LOC, ROC, Chin EMG \|

	## Intended Use

	- Primary use: Research on explainable automated sleep staging from PSG recordings.
	- Intended users: Sleep medicine researchers, clinical informatics researchers, and AI/ML researchers working on interpretable medical AI.
	- Clinical note: This model is intended for research purposes. It has not been validated for clinical diagnostic use and should not replace professional sleep technologist scoring in clinical settings.

	## Citation

	If you use SleepVLM in your research, please cite:

	```bibtex
	@article{deng2026sleepvlm,
	author = {Deng, Guifeng and Wang, Pan and Wang, Jiquan and Li, Tao and Jiang, Haiteng},
	title = {{SleepVLM}: Explainable and Rule-Grounded Sleep Staging
	via a Vision-Language Model},
	journal = {}, % TODO: update after publication
	year = {2026}
	}
	```

	## License

	This model is released under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).