File size: 3,407 Bytes
f9d3bd8
75261a3
f9d3bd8
 
 
 
 
 
 
 
 
 
 
 
 
9d73152
f9d3bd8
9d73152
 
 
f9d3bd8
9d73152
 
f9d3bd8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7a3dacb
f9d3bd8
 
75261a3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
---
license: mit
tags:
  - sleep
  - eeg
  - ecg
  - eog
  - emg
  - respiratory
  - physiological-signals
  - foundation-model
  - time-series
pipeline_tag: feature-extraction
---

# Hypnos

Hypnos is an autoregressive RQ-Transformer trained via next-token prediction on tokenized
streams of physiological sensor data. It produces general-purpose **1 Hz embeddings** of sleep
physiology for downstream tasks (sleep staging, arousal/event detection, etc.).

This repo holds the released model as a single weight-only `safetensors` bundle: the
RQ-Transformer **and** all 5 tokenizers, plus the config (model + tokenizer construction
kwargs, modality layout) in the file metadata — so loading is fully self-contained.

## Usage

Use the `hypnos` inference library:

```python
from hypnos.embedding import embed_edf

emb = embed_edf("recording.edf")   # defaults to this repo
# emb: dict {modality_name: np.ndarray [n_seconds, embed_dim] float16}
#   e.g. emb["eeg_c3"], emb["ecg"], ... — one vector per second, per present modality
```

Embeddings are returned **per modality** at the model's native **1 Hz** cadence (one vector
per second). Only modalities present in the recording appear in the dict. For US recordings
pass `notch_freq=60.0` (the default is 50 Hz).

### Pooling

The 1 Hz per-modality output is the raw signal; pool it however your task needs — e.g. a
single embedding per 30-second sleep epoch, averaged over modalities and time:

```python
import numpy as np

emb = embed_edf("recording.edf")
fused = np.mean(list(emb.values()), axis=0)              # over modalities -> [T, D]
n = fused.shape[0] // 30
epochs = fused[: n * 30].reshape(n, 30, -1).mean(axis=1)  # over 30-s epochs -> [T//30, D]
```

## Modalities

8 modalities sharing 5 tokenizers (K = RVQ levels, all codebooks size 2048, 1 token/sec):

| modality | channel | signal | tokenizer | K | sample rate |
|---|---|---|---|---|---|
| `eeg_c3` | C3 | EEG | eeg-q8 | 8 | 128 Hz |
| `eeg_c4` | C4 | EEG | eeg-q8 | 8 | 128 Hz |
| `eog_e1` | E1 | EOG | eog-q8 | 8 | 128 Hz |
| `eog_e2` | E2 | EOG | eog-q8 | 8 | 128 Hz |
| `emg_chin` | Chin | EMG | emg-q8 | 8 | 128 Hz |
| `ecg` | ECG | ECG | ecg-q4 | 4 | 128 Hz |
| `resp_abd` | ABD | respiratory | resp-q4 | 4 | 32 Hz |
| `resp_thx` | THX | respiratory | resp-q4 | 4 | 32 Hz |

EEG/EOG channels are contralaterally referenced (e.g. C3-M2); Chin EMG is a bipolar
derivation; ECG/respiratory are used directly. Embedding dimension is 768.

## Devices

CUDA and CPU are supported. **Apple Silicon (MPS) is not** — PyTorch's `flex_attention` has no
Metal kernel, so on a Mac use `device="cpu"` (a 2-minute clip embeds in a few seconds; a full
night takes ~1 minute).

## Format

`hypnos.safetensors` — weights under namespaced keys (`model/…`, `tok/<name>/…`)
with the config as a JSON string in the file metadata. Loaded with `safetensors` (no
arbitrary-code unpickling).

## License

Released under the [MIT License](LICENSE).

## Citation

```bibtex
@online{carterNextTokenPredictionLearns2026,
  title       = {Next-Token Prediction Learns Generalisable Representations of Sleep Physiology},
  author      = {Carter, Jonathan F. and Tarassenko, Lionel},
  date        = {2026-06-08},
  eprint      = {2606.09605},
  eprinttype  = {arXiv},
  eprintclass = {cs.AI},
  doi         = {10.48550/arXiv.2606.09605},
  url         = {http://arxiv.org/abs/2606.09605},
}
```