README.md · kokemn/Wearable_TimeSeries_Health

File size: 21,401 Bytes

bae9e74

---
library_name: pytorch
pipeline_tag: time-series-forecasting
language:
  - zh
  - en
tags:
  - anomaly-detection
  - time-series
  - wearable
  - health
  - lstm
  - transformer
  - physiological-monitoring
  - hrv
  - heart-rate
  - real-time
  - multi-user
  - personalized
  - sensor-fusion
  - healthcare
  - continuous-monitoring
license: apache-2.0
pretty_name: Wearable TimeSeries Health Monitor
---

<div align="center">

**Language / 语言**: [中文](#中文版本) | [English](#english-version)

</div>

---

<a id="中文版本"></a>
# Wearable_TimeSeries_Health_Monitor

面向可穿戴设备的多用户健康监控方案：一份模型、一个配置，就能为不同用户构建个性化异常检测。模型基于 **Phased LSTM + Temporal Fusion Transformer (TFT)**，并整合自适应基线、因子特征以及单位秒级的数据滑窗能力，适合当作 HuggingFace 模型或企业内部服务快速接入。

---

## 🌟 模型应用亮点

| 能力 | 说明 |
| --- | --- |
| **即插即用** | 内置 `WearableAnomalyDetector` 封装，加载模型即可预测，一次初始化后可持续监控多个用户 |
| **配置驱动特征** | `configs/features_config.json` 描述所有特征、缺省值、类别映射，新增/删减血氧、呼吸率等只需改配置 |
| **多用户实时服务** | `FeatureCalculator` + 轻量级 `data_storage` 缓存，实现用户历史管理、基线演化、批量推理 |
| **多场景 Demo** | `test_wearable_service.py` 内置 3 个真实"客户"案例：完整传感器、缺少字段、匿名设备，即使没有原始数据也能立即体验 |
| **自适应基线支持** | 可扩展 `UserDataManager` 将个人/分组基线接入推理流程，持续改善个体敏感度 |

---

## ⚡ 核心特点与技术优势

### 🎯 自适应基线：个人与群体智能融合

模型采用**自适应基线策略**，根据用户历史数据量动态选择最优基线：

- **个人基线优先**：当用户有足够历史数据（如 ≥7 天）时，使用个人 HRV 均值/标准差作为基线，捕捉个体生理节律差异
- **群体基线兜底**：新用户或数据稀疏时，自动切换到群体统计基线，确保冷启动也能稳定检测
- **平滑过渡机制**：通过加权混合（如 `final_mean = α × personal_mean + (1-α) × group_mean`）实现从群体到个人的渐进式适应
- **实时基线更新**：推理过程中持续累积用户数据，基线随用户状态演化而动态调整，提升长期监控精度

**优势**：相比固定阈值或纯群体基线，自适应基线能同时兼顾**个性化敏感度**（减少误报）和**冷启动鲁棒性**（新用户可用），特别适合多用户、长周期监控场景。

### ⏱️ 灵活的时间窗口与周期

- **5 分钟级粒度**：每条数据点代表 5 分钟聚合，支持秒级到小时级的灵活时间尺度
- **可配置窗口大小**：默认 12 点（1 小时），可根据业务需求调整为 6 点（30 分钟）或 24 点（2 小时）
- **不等间隔容错**：Phased LSTM 架构天然处理缺失数据点，即使数据稀疏（如夜间传感器断开）也能稳定推理
- **多时间尺度特征**：同时提取短期波动（RMSSD）、中期趋势（滑动均值）和长期模式（日/周周期），捕捉不同时间尺度的异常信号

**优势**：适应不同设备采样频率、用户佩戴习惯，无需强制对齐时间戳，降低数据预处理复杂度。

### 🔄 多通道数据协同作用

模型整合**4 大类特征通道**，通过因子特征与注意力机制实现跨通道信息融合：

1. **生理通道**（HR、HRV 系列、呼吸率、血氧）
   - 直接反映心血管与呼吸系统状态
   - 因子特征：`physiological_mean`, `physiological_std`, `physiological_max`, `physiological_min`

2. **活动通道**（步数、距离、能量消耗、加速度、陀螺仪）
   - 捕捉运动强度与身体负荷
   - 因子特征：`activity_mean`, `activity_std` 等

3. **环境通道**（光线、时间周期、数据质量）
   - 提供上下文信息，区分运动性心率升高 vs 静息异常
   - 类别特征：`time_period_primary`（morning/day/evening/night）

4. **基线通道**（自适应基线均值/标准差、偏差特征）
   - 提供个性化参考基准，计算 `hrv_deviation_abs`, `hrv_z_score` 等相对异常指标

**协同机制**：
- **因子特征聚合**：将同类通道的统计量（均值/标准差/最值）作为高层特征，让模型学习通道间的关联模式
- **TFT 注意力**：Temporal Fusion Transformer 的变量选择网络自动识别哪些通道在特定时间点最重要
- **已知未来特征**：时间特征（小时、星期、是否周末）帮助模型理解周期性，区分正常波动与异常

**优势**：多通道协同能显著降低**单一指标误报**（如运动导致心率升高），提升**异常检测的上下文感知能力**，特别适合可穿戴设备的多传感器融合场景。

---

## 📊 核心指标（短期窗口）

- **F1**: 0.2819
- **Precision**: 0.1769
- **Recall**: 0.6941
- **最佳阈值**: 0.53
- **窗口定义**: 12 条 5 分钟数据（1小时时间窗，预测未来 0.5 小时）

> 模型偏向召回，适合“异常先提醒、人机协同复核”的场景。可通过阈值/采样策略调节精度与召回。

---

## 🚀 快速体验

### 1. 克隆或下载模型仓库

```bash
git clone https://huggingface.co/oscarzhang/Wearable_TimeSeries_Health_Monitor
cd Wearable_TimeSeries_Health_Monitor
pip install -r requirements.txt
```

### 2. 运行内置 Demo（无需额外数据）

```bash
# 默认跑 ab60 案例
python test_wearable_service.py

# 批量跑全部预置客户
python test_wearable_service.py --case all

# 想从原始 stage1 CSV 抽样测试
python test_wearable_service.py --from-raw
```

`test_wearable_service.py` 将自动：
- 加载 `WearableAnomalyDetector`
- 读取配置驱动特征
- 构建窗口并执行预测
- 输出每位“客户”的异常分数、阈值、预测详情

### 3. 在业务代码中调用

```python
from wearable_anomaly_detector import WearableAnomalyDetector

detector = WearableAnomalyDetector(
    model_dir="checkpoints/phase2/exp_factor_balanced",
    threshold=0.53,
)

result = detector.predict(data_points, return_score=True, return_details=True)
print(result)
```

> `data_points` 为 12 条最新的 5 分钟记录；若缺静态特征/设备信息，系统会自动从配置/缓存补齐。

---

## 🔧 输入与输出

### 输入（单个数据点）

```python
{
  "timestamp": "2024-01-01T08:00:00",
  "deviceId": "ab60",            # 可选，缺失时会自动创建匿名 ID
  "features": {
    "hr": 72.0,
    "hrv_rmssd": 30.0,
    "time_period_primary": "morning",
    "data_quality": "high",
    ...
  }
}
```

- 每个窗口需 12 条数据（默认 1 小时）
- 特征是否必填由 `configs/features_config.json` 控制
- 缺失值会自动回落到 default 或 category_mapping 定义值

### 输出

```python
{
  "is_anomaly": True,
  "anomaly_score": 0.5760,
  "threshold": 0.5300,
  "details": {
     "window_size": 12,
     "model_output": 0.5760,
     "prediction_confidence": 0.0460
  }
}
```

---

## 🧱 模型架构与训练

- **模型骨干**：Phased LSTM 处理不等间隔序列 + Temporal Fusion Transformer 聚合时间上下文
- **异常检测头**：增强注意力、多层 MLP、可选对比学习/类型辅助头
- **特征体系**：
  - 生理：HR、HRV（RMSSD/SDNN/PNN50…）
  - 活动：步数、距离、能量消耗、加速度、陀螺仪
  - 环境：光线、昼夜标签、数据质量
  - 基线：自适应基线均值/标准差 + 偏差特征
- **标签来源**：问卷高置信度标签 + 自适应基线低置信度标签
- **训练流程**：Stage1/2/3 数据加工 ➜ Phase1 自监督预训练 ➜ Phase2 监督微调 ➜ 阈值/案例校正

---

## 📦 仓库结构（部分）

```
├─ configs/
│   └─ features_config.json     # 特征定义 & 归一化策略
├─ wearable_anomaly_detector.py # 核心封装：加载、预测、批处理
├─ feature_calculator.py        # 配置驱动的特征构建 + 用户历史缓存
├─ test_wearable_service.py     # HuggingFace Demo脚本（内含预置案例）
└─ checkpoints/phase2/...       # 模型权重 & summary
```

---

## 📚 数据来源与许可证

- 训练数据基于 **“A continuous real-world dataset comprising wearable-based heart rate variability alongside sleep diaries”**（Baigutanova *et al.*, Scientific Data, 2025）以及其 Figshare 数据集 [doi:10.1038/s41597-025-05801-3](https://www.nature.com/articles/s41597-025-05801-3) / [dataset link](https://springernature.figshare.com/articles/dataset/In-situ_wearable-based_dataset_of_continuous_heart_rate_variability_monitoring_accompanied_by_sleep_diaries/28509740)。
- 该数据集以 **Creative Commons Attribution 4.0 (CC BY 4.0)** 许可发布，可自由使用、修改、分发，但必须保留署名并附上许可证链接。
- 本仓库沿用 CC BY 4.0 对原始数据的要求；若你在此基础上再加工或发布，请继续保留上述署名与许可证说明。
- 代码/模型可根据需要使用 MIT/Apache 等许可证，但凡涉及数据的部分，仍需遵循 CC BY 4.0。

---

## 🤝 贡献与扩展

欢迎：
1. 新增特征或数据源 ⇒ 更新 `features_config.json` + 提交 PR
2. 接入新的用户数据管理/基线策略 ⇒ 扩展 `FeatureCalculator` 或贡献 `UserDataManager`
3. 反馈案例或真实部署经验 ⇒ 提 Issue 或 Discussion

---

## 📄 许可证

- **模型与代码**：Apache-2.0。可在保留版权与许可证声明的前提下任意使用/修改/分发。
- **训练数据**：原始可穿戴 HRV 数据集使用 CC BY 4.0，复用时请继续保留作者署名与许可信息。

---

## 🔖 引用

```bibtex
@software{Wearable_TimeSeries_Health_Monitor,
  title  = {Wearable\_TimeSeries\_Health\_Monitor},
  author = {oscarzhang},
  year   = {2025},
  url    = {https://huggingface.co/oscarzhang/Wearable_TimeSeries_Health_Monitor}
}
```

---

<a id="english-version"></a>
# Wearable_TimeSeries_Health_Monitor

A multi-user health monitoring solution for wearable devices: one model, one configuration, enabling personalized anomaly detection for different users. The model is based on **Phased LSTM + Temporal Fusion Transformer (TFT)**, integrating adaptive baselines, factor features, and second-level data sliding window capabilities, suitable for deployment as a HuggingFace model or rapid integration into enterprise services.

---

## 🌟 Model Highlights

| Capability | Description |
| --- | --- |
| **Plug-and-Play** | Built-in `WearableAnomalyDetector` wrapper, load the model and start predicting, supports continuous monitoring of multiple users after a single initialization |
| **Configuration-Driven Features** | `configs/features_config.json` defines all features, default values, and category mappings; adding/removing features like blood oxygen or respiratory rate only requires configuration changes |
| **Multi-User Real-Time Service** | `FeatureCalculator` + lightweight `data_storage` cache enables user history management, baseline evolution, and batch inference |
| **Multi-Scenario Demo** | `test_wearable_service.py` includes 3 real "client" cases: complete sensors, missing fields, anonymous devices, allowing immediate experience even without raw data |
| **Adaptive Baseline Support** | Extensible `UserDataManager` integrates personal/group baselines into the inference pipeline, continuously improving individual sensitivity |

---

## ⚡ Core Features & Technical Advantages

### 🎯 Adaptive Baseline: Intelligent Fusion of Personal and Group

The model employs an **adaptive baseline strategy** that dynamically selects the optimal baseline based on user historical data volume:

- **Personal Baseline Priority**: When users have sufficient historical data (e.g., ≥7 days), use personal HRV mean/std as baseline to capture individual physiological rhythm differences
- **Group Baseline Fallback**: For new users or sparse data, automatically switch to group statistical baseline, ensuring stable detection even during cold start
- **Smooth Transition Mechanism**: Achieve gradual adaptation from group to personal through weighted mixing (e.g., `final_mean = α × personal_mean + (1-α) × group_mean`)
- **Real-Time Baseline Updates**: Continuously accumulate user data during inference, baseline dynamically adjusts as user state evolves, improving long-term monitoring accuracy

**Advantage**: Compared to fixed thresholds or pure group baselines, adaptive baselines balance **personalized sensitivity** (reducing false positives) and **cold-start robustness** (usable for new users), especially suitable for multi-user, long-term monitoring scenarios.

### ⏱️ Flexible Time Windows & Periods

- **5-Minute Granularity**: Each data point represents 5-minute aggregation, supporting flexible time scales from seconds to hours
- **Configurable Window Size**: Default 12 points (1 hour), adjustable to 6 points (30 minutes) or 24 points (2 hours) based on business needs
- **Uneven Interval Tolerance**: Phased LSTM architecture naturally handles missing data points, stable inference even with sparse data (e.g., sensor disconnection at night)
- **Multi-Time-Scale Features**: Simultaneously extract short-term fluctuations (RMSSD), medium-term trends (rolling mean), and long-term patterns (daily/weekly cycles), capturing anomaly signals at different time scales

**Advantage**: Adapts to different device sampling frequencies and user wearing habits, no need to force timestamp alignment, reducing data preprocessing complexity.

### 🔄 Multi-Channel Data Synergy

The model integrates **4 major feature channels**, achieving cross-channel information fusion through factor features and attention mechanisms:

1. **Physiological Channel** (HR, HRV series, respiratory rate, blood oxygen)
   - Directly reflects cardiovascular and respiratory system status
   - Factor features: `physiological_mean`, `physiological_std`, `physiological_max`, `physiological_min`

2. **Activity Channel** (steps, distance, energy consumption, acceleration, gyroscope)
   - Captures exercise intensity and body load
   - Factor features: `activity_mean`, `activity_std`, etc.

3. **Environmental Channel** (light, time period, data quality)
   - Provides contextual information, distinguishing exercise-induced heart rate elevation vs. resting anomalies
   - Categorical features: `time_period_primary` (morning/day/evening/night)

4. **Baseline Channel** (adaptive baseline mean/std, deviation features)
   - Provides personalized reference baseline, calculating relative anomaly indicators like `hrv_deviation_abs`, `hrv_z_score`

**Synergy Mechanism**:
- **Factor Feature Aggregation**: Use statistical measures (mean/std/max/min) of similar channels as high-level features, enabling the model to learn association patterns between channels
- **TFT Attention**: Temporal Fusion Transformer's variable selection network automatically identifies which channels are most important at specific time points
- **Known Future Features**: Time features (hour, day of week, is_weekend) help the model understand periodicity, distinguishing normal fluctuations from anomalies

**Advantage**: Multi-channel synergy significantly reduces **single-indicator false positives** (e.g., exercise-induced heart rate elevation) and improves **context-aware anomaly detection**, especially suitable for multi-sensor fusion scenarios in wearable devices.

---

## 📊 Core Metrics (Short-Term Window)

- **F1**: 0.2819
- **Precision**: 0.1769
- **Recall**: 0.6941
- **Optimal Threshold**: 0.53
- **Window Definition**: 12 data points of 5-minute intervals (1-hour time window, predicting 0.5 hours ahead)

> The model favors recall, suitable for "anomaly-first alert, human-machine collaborative review" scenarios. Precision and recall can be adjusted through threshold/sampling strategies.

---

## 🚀 Quick Start

### 1. Clone or Download the Model Repository

```bash
git clone https://huggingface.co/oscarzhang/Wearable_TimeSeries_Health_Monitor
cd Wearable_TimeSeries_Health_Monitor
pip install -r requirements.txt
```

### 2. Run Built-in Demo (No Additional Data Required)

```bash
# Run ab60 case by default
python test_wearable_service.py

# Run all predefined clients
python test_wearable_service.py --case all

# Sample from raw stage1 CSV for testing
python test_wearable_service.py --from-raw
```

`test_wearable_service.py` will automatically:
- Load `WearableAnomalyDetector`
- Read configuration-driven features
- Build windows and execute predictions
- Output anomaly scores, thresholds, and prediction details for each "client"

### 3. Call in Business Code

```python
from wearable_anomaly_detector import WearableAnomalyDetector

detector = WearableAnomalyDetector(
    model_dir="checkpoints/phase2/exp_factor_balanced",
    threshold=0.53,
)

result = detector.predict(data_points, return_score=True, return_details=True)
print(result)
```

> `data_points` should be 12 latest 5-minute records; if static features/device information are missing, the system will automatically fill from configuration/cache.

---

## 🔧 Input & Output

### Input (Single Data Point)

```python
{
  "timestamp": "2024-01-01T08:00:00",
  "deviceId": "ab60",            # Optional, anonymous ID will be created if missing
  "features": {
    "hr": 72.0,
    "hrv_rmssd": 30.0,
    "time_period_primary": "morning",
    "data_quality": "high",
    ...
  }
}
```

- Each window requires 12 data points (default 1 hour)
- Whether features are required is controlled by `configs/features_config.json`
- Missing values automatically fall back to default or category_mapping defined values

### Output

```python
{
  "is_anomaly": True,
  "anomaly_score": 0.5760,
  "threshold": 0.5300,
  "details": {
     "window_size": 12,
     "model_output": 0.5760,
     "prediction_confidence": 0.0460
  }
}
```

---

## 🧱 Model Architecture & Training

- **Model Backbone**: Phased LSTM handles unevenly-spaced sequences + Temporal Fusion Transformer aggregates temporal context
- **Anomaly Detection Head**: Enhanced attention, multi-layer MLP, optional contrastive learning/type auxiliary head
- **Feature System**:
  - Physiological: HR, HRV (RMSSD/SDNN/PNN50…)
  - Activity: Steps, distance, energy consumption, acceleration, gyroscope
  - Environmental: Light, day/night labels, data quality
  - Baseline: Adaptive baseline mean/std + deviation features
- **Label Source**: High-confidence questionnaire labels + low-confidence adaptive baseline labels
- **Training Pipeline**: Stage1/2/3 data processing ➜ Phase1 self-supervised pre-training ➜ Phase2 supervised fine-tuning ➜ Threshold/case calibration

---

## 📦 Repository Structure (Partial)

```
├─ configs/
│   └─ features_config.json     # Feature definitions & normalization strategies
├─ wearable_anomaly_detector.py # Core wrapper: loading, prediction, batch processing
├─ feature_calculator.py        # Configuration-driven feature construction + user history cache
├─ test_wearable_service.py     # HuggingFace Demo script (includes predefined cases)
└─ checkpoints/phase2/...       # Model weights & summary
```

---

## 📚 Data Source & License

- Training data is based on **"A continuous real-world dataset comprising wearable-based heart rate variability alongside sleep diaries"** (Baigutanova *et al.*, Scientific Data, 2025) and its Figshare dataset [doi:10.1038/s41597-025-05801-3](https://www.nature.com/articles/s41597-025-05801-3) / [dataset link](https://springernature.figshare.com/articles/dataset/In-situ_wearable-based_dataset_of_continuous_heart_rate_variability_monitoring_accompanied_by_sleep_diaries/28509740).
- This dataset is released under **Creative Commons Attribution 4.0 (CC BY 4.0)** license, allowing free use, modification, and distribution, but attribution and license link must be retained.
- This repository follows CC BY 4.0 requirements for original data; if you further process or publish based on this, please continue to retain the above attribution and license information.
- Code/models can use MIT/Apache or other licenses as needed, but any parts involving data must still follow CC BY 4.0.

---

## 🤝 Contributions & Extensions

Welcome to:
1. Add new features or data sources ⇒ Update `features_config.json` + submit PR
2. Integrate new user data management/baseline strategies ⇒ Extend `FeatureCalculator` or contribute `UserDataManager`
3. Provide feedback on cases or real deployment experiences ⇒ Open Issues or Discussions

---

## 📄 License

- **Model & Code**: Apache-2.0. Can be used/modified/distributed freely while retaining copyright and license notices.
- **Training Data**: Original wearable HRV dataset uses CC BY 4.0; please continue to retain author attribution and license information when reusing.

---

## 🔖 Citation

```bibtex
@software{Wearable_TimeSeries_Health_Monitor,
  title  = {Wearable\_TimeSeries\_Health\_Monitor},
  author = {oscarzhang},
  year   = {2025},
  url    = {https://huggingface.co/oscarzhang/Wearable_TimeSeries_Health_Monitor}
}
```