Instructions to use Feng613/SleepVLM-3B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Feng613/SleepVLM-3B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="Feng613/SleepVLM-3B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("Feng613/SleepVLM-3B")
model = AutoModelForMultimodalLM.from_pretrained("Feng613/SleepVLM-3B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Feng613/SleepVLM-3B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Feng613/SleepVLM-3B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Feng613/SleepVLM-3B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/Feng613/SleepVLM-3B

SGLang

How to use Feng613/SleepVLM-3B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Feng613/SleepVLM-3B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Feng613/SleepVLM-3B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Feng613/SleepVLM-3B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Feng613/SleepVLM-3B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use Feng613/SleepVLM-3B with Docker Model Runner:
```
docker model run hf.co/Feng613/SleepVLM-3B
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

SleepVLM-3B

Explainable and Rule-Grounded Sleep Staging via a Vision-Language Model

Paper (coming soon) | GitHub | MASS-EX Dataset | Quantized Version (W4A16) | Collection

Associated Paper: Guifeng Deng, Pan Wang, Jiquan Wang, Tao Li, Haiteng Jiang. "SleepVLM: Explainable and Rule-Grounded Sleep Staging via a Vision-Language Model." In preparation. This repository will be made public upon release of the preprint.

Overview

SleepVLM-3B is a rule-grounded vision-language model for explainable automated sleep staging from polysomnography (PSG) recordings. Unlike conventional black-box classifiers that output only a stage label, SleepVLM generates clinician-readable natural-language rationales citing specific AASM scoring rules for every 30-second epoch, making each staging decision auditable against the clinical standard.

The model takes rendered multi-channel PSG waveform images as input (three consecutive 30-second epochs) and produces a predicted sleep stage (W/N1/N2/N3/R), applicable AASM rule identifiers, and a structured natural-language rationale.

SleepVLM-3B is fine-tuned from Qwen2.5-VL-3B-Instruct through a two-phase training pipeline: waveform-perceptual pre-training (WPT) followed by rule-grounded supervised fine-tuning (SFT) using expert annotations from MASS-EX.

Model Details

Property	Value
Base model	Qwen2.5-VL-3B-Instruct
Parameters	~3.1B
Model size	7.1 GB (BF16)
Fine-tuning method	LoRA (r=16, alpha=32, dropout=0.05)
Training hardware	8x NVIDIA A100 80GB
Precision	bfloat16
Input	Three consecutive 30-s PSG epoch images (448 x 224 px)
PSG channels	F4-M1, C4-M1, O2-M1, LOC, ROC, Chin EMG

Intended Use

Primary use: Research on explainable automated sleep staging from PSG recordings.
Intended users: Sleep medicine researchers, clinical informatics researchers, and AI/ML researchers working on interpretable medical AI.
Clinical note: This model is intended for research purposes. It has not been validated for clinical diagnostic use and should not replace professional sleep technologist scoring in clinical settings.

Citation

If you use SleepVLM in your research, please cite:

@article{deng2026sleepvlm,
  author    = {Deng, Guifeng and Wang, Pan and Wang, Jiquan and Li, Tao and Jiang, Haiteng},
  title     = {{SleepVLM}: Explainable and Rule-Grounded Sleep Staging
               via a Vision-Language Model},
  journal   = {}, % TODO: update after publication
  year      = {2026}
}