Model Overview

Physio Transformer HRV is a self‑supervised transformer encoder trained on long‑duration ECG‑derived heart rate (HR) and heart rate variability (HRV) signals from the MIT‑BIH Atrial Fibrillation Database (AFDB). The model learns general‑purpose physiological representations using a masked heart‑rate reconstruction objective, similar to BERT‑style masked modeling for time series.

This encoder is designed as a foundation model for wearable‑style biometrics, capturing long‑range temporal patterns in HR/HRV dynamics such as circadian rhythms, autonomic balance, recovery, and arrhythmia‑like irregularity.

Intended Use

This model is intended for feature extraction and downstream fine‑tuning on physiological or wearable‑related tasks, including:

  • Stress detection
  • Sleep staging
  • Activity recognition
  • HR forecasting
  • Health anomaly detection
  • Arrhythmia or irregularity detection
  • Personalized biometrics modeling

The encoder outputs a sequence of 64‑dimensional embeddings for each timestep.

Training Objective

The model is trained using masked HR reconstruction:

  • 15% of HR values are randomly masked
  • HRV + activity + unmasked HR are provided as input

The model predicts the missing HR values

This forces the encoder to learn:

  • HR + HRV relationships
  • Temporal dynamics
  • Autonomic patterns
  • Long‑range dependencies
  • Physiological variability

This is analogous to BERT pretraining, but for biometrics.

Architecture

Transformer encoder (Pre‑LayerNorm)

  • 3 layers, 4 heads, 64‑dimensional model
  • Learned positional embeddings

Input channels:

  • HR (1)
  • HRV (1)
  • Activity (3 placeholder channels)

Output: (batch, time, d_model) embeddings

The model uses dropout, residual connections, and a corrected key_dim = d_model // num_heads.

Data

Training data is derived from the MIT‑BIH Atrial Fibrillation Database (AFDB):

  • ~30 full‑length 10‑hour ECG recordings
  • R‑peaks extracted using a fast Pan‑Tompkins‑style detector
  • HR + HRV computed in sliding windows
  • Time‑series sliced into fixed‑length windows (T=128, step=32)
  • NaNs imputed per‑window

This produces thousands of training samples from a small number of long recordings.

Performance

The model is evaluated using masked MAE on held‑out validation windows.

Typical results:

  • Masked MAE: ~5 BPM
  • Median error: low
  • Stable reconstruction across a wide HR range

These metrics indicate that the encoder successfully learns physiological structure.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support