joncarter
/

hypnos

+---
+tags:
+  - sleep
+  - eeg
+  - ecg
+  - eog
+  - emg
+  - respiratory
+  - physiological-signals
+  - foundation-model
+  - multimodal
+  - time-series
+pipeline_tag: feature-extraction
+---
+# Hypnos (multimodal)
+Hypnos is an autoregressive RQ-Transformer trained via multi-modal next-token prediction on
+tokenized streams of physiological sensor data. It produces general-purpose **1 Hz embeddings**
+of sleep physiology for downstream tasks (sleep staging, arousal/event detection, etc.).
+This repo holds the released **multimodal** model as a single weight-only `safetensors` bundle:
+the RQ-Transformer **and** all 5 tokenizers, plus the config (model + tokenizer construction
+kwargs, modality layout) in the file metadata — so loading is fully self-contained.
+## Usage
+Use the `hypnos` inference library:
+```python
+from hypnos.embedding import embed_edf
+emb = embed_edf("recording.edf")   # defaults to this repo
+# emb: dict {modality_name: np.ndarray [n_seconds, embed_dim] float16}
+#   e.g. emb["eeg_c3"], emb["ecg"], ... — one vector per second, per present modality
+```
+Embeddings are returned **per modality** at the model's native **1 Hz** cadence (one vector
+per second). Only modalities present in the recording appear in the dict. For US recordings
+pass `notch_freq=60.0` (the default is 50 Hz).
+### Pooling
+The 1 Hz per-modality output is the raw signal; pool it however your task needs — e.g. a
+single embedding per 30-second sleep epoch, averaged over modalities and time:
+```python
+import numpy as np
+emb = embed_edf("recording.edf")
+fused = np.mean(list(emb.values()), axis=0)              # over modalities -> [T, D]
+n = fused.shape[0] // 30
+epochs = fused[: n * 30].reshape(n, 30, -1).mean(axis=1)  # over 30-s epochs -> [T//30, D]
+```
+## Modalities
+8 modalities sharing 5 tokenizers (K = RVQ levels, all codebooks size 2048, 1 token/sec):
+| modality | channel | signal | tokenizer | K | sample rate |
+|---|---|---|---|---|---|
+| `eeg_c3` | C3 | EEG | eeg-q8 | 8 | 128 Hz |
+| `eeg_c4` | C4 | EEG | eeg-q8 | 8 | 128 Hz |
+| `eog_e1` | E1 | EOG | eog-q8 | 8 | 128 Hz |
+| `eog_e2` | E2 | EOG | eog-q8 | 8 | 128 Hz |
+| `emg_chin` | Chin | EMG | emg-q8 | 8 | 128 Hz |
+| `ecg` | ECG | ECG | ecg-q4 | 4 | 128 Hz |
+| `resp_abd` | ABD | respiratory | resp-q4 | 4 | 32 Hz |
+| `resp_thx` | THX | respiratory | resp-q4 | 4 | 32 Hz |
+EEG/EOG channels are contralaterally referenced (e.g. C3-M2); Chin EMG is a bipolar
+derivation; ECG/respiratory are used directly. Embedding dimension is 768.
+## Devices
+CUDA and CPU are supported. **Apple Silicon (MPS) is not** — PyTorch's `flex_attention` has no
+Metal kernel, so on a Mac use `device="cpu"` (a 2-minute clip embeds in a few seconds; a full
+night takes ~1 minute).
+## Format
+`hypnos_multimodal.safetensors` — weights under namespaced keys (`model/…`, `tok/<name>/…`)
+with the config as a JSON string in the file metadata. Loaded with `safetensors` (no
+arbitrary-code unpickling).