joncarter commited on
Commit
f9d3bd8
·
verified ·
1 Parent(s): aa2a877

Add model card

Browse files
Files changed (1) hide show
  1. README.md +84 -0
README.md ADDED
@@ -0,0 +1,84 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sleep
4
+ - eeg
5
+ - ecg
6
+ - eog
7
+ - emg
8
+ - respiratory
9
+ - physiological-signals
10
+ - foundation-model
11
+ - multimodal
12
+ - time-series
13
+ pipeline_tag: feature-extraction
14
+ ---
15
+
16
+ # Hypnos (multimodal)
17
+
18
+ Hypnos is an autoregressive RQ-Transformer trained via multi-modal next-token prediction on
19
+ tokenized streams of physiological sensor data. It produces general-purpose **1 Hz embeddings**
20
+ of sleep physiology for downstream tasks (sleep staging, arousal/event detection, etc.).
21
+
22
+ This repo holds the released **multimodal** model as a single weight-only `safetensors` bundle:
23
+ the RQ-Transformer **and** all 5 tokenizers, plus the config (model + tokenizer construction
24
+ kwargs, modality layout) in the file metadata — so loading is fully self-contained.
25
+
26
+ ## Usage
27
+
28
+ Use the `hypnos` inference library:
29
+
30
+ ```python
31
+ from hypnos.embedding import embed_edf
32
+
33
+ emb = embed_edf("recording.edf") # defaults to this repo
34
+ # emb: dict {modality_name: np.ndarray [n_seconds, embed_dim] float16}
35
+ # e.g. emb["eeg_c3"], emb["ecg"], ... — one vector per second, per present modality
36
+ ```
37
+
38
+ Embeddings are returned **per modality** at the model's native **1 Hz** cadence (one vector
39
+ per second). Only modalities present in the recording appear in the dict. For US recordings
40
+ pass `notch_freq=60.0` (the default is 50 Hz).
41
+
42
+ ### Pooling
43
+
44
+ The 1 Hz per-modality output is the raw signal; pool it however your task needs — e.g. a
45
+ single embedding per 30-second sleep epoch, averaged over modalities and time:
46
+
47
+ ```python
48
+ import numpy as np
49
+
50
+ emb = embed_edf("recording.edf")
51
+ fused = np.mean(list(emb.values()), axis=0) # over modalities -> [T, D]
52
+ n = fused.shape[0] // 30
53
+ epochs = fused[: n * 30].reshape(n, 30, -1).mean(axis=1) # over 30-s epochs -> [T//30, D]
54
+ ```
55
+
56
+ ## Modalities
57
+
58
+ 8 modalities sharing 5 tokenizers (K = RVQ levels, all codebooks size 2048, 1 token/sec):
59
+
60
+ | modality | channel | signal | tokenizer | K | sample rate |
61
+ |---|---|---|---|---|---|
62
+ | `eeg_c3` | C3 | EEG | eeg-q8 | 8 | 128 Hz |
63
+ | `eeg_c4` | C4 | EEG | eeg-q8 | 8 | 128 Hz |
64
+ | `eog_e1` | E1 | EOG | eog-q8 | 8 | 128 Hz |
65
+ | `eog_e2` | E2 | EOG | eog-q8 | 8 | 128 Hz |
66
+ | `emg_chin` | Chin | EMG | emg-q8 | 8 | 128 Hz |
67
+ | `ecg` | ECG | ECG | ecg-q4 | 4 | 128 Hz |
68
+ | `resp_abd` | ABD | respiratory | resp-q4 | 4 | 32 Hz |
69
+ | `resp_thx` | THX | respiratory | resp-q4 | 4 | 32 Hz |
70
+
71
+ EEG/EOG channels are contralaterally referenced (e.g. C3-M2); Chin EMG is a bipolar
72
+ derivation; ECG/respiratory are used directly. Embedding dimension is 768.
73
+
74
+ ## Devices
75
+
76
+ CUDA and CPU are supported. **Apple Silicon (MPS) is not** — PyTorch's `flex_attention` has no
77
+ Metal kernel, so on a Mac use `device="cpu"` (a 2-minute clip embeds in a few seconds; a full
78
+ night takes ~1 minute).
79
+
80
+ ## Format
81
+
82
+ `hypnos_multimodal.safetensors` — weights under namespaced keys (`model/…`, `tok/<name>/…`)
83
+ with the config as a JSON string in the file metadata. Loaded with `safetensors` (no
84
+ arbitrary-code unpickling).