File size: 21,401 Bytes
bae9e74 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 | ---
library_name: pytorch
pipeline_tag: time-series-forecasting
language:
- zh
- en
tags:
- anomaly-detection
- time-series
- wearable
- health
- lstm
- transformer
- physiological-monitoring
- hrv
- heart-rate
- real-time
- multi-user
- personalized
- sensor-fusion
- healthcare
- continuous-monitoring
license: apache-2.0
pretty_name: Wearable TimeSeries Health Monitor
---
<div align="center">
**Language / 语言**: [中文](#中文版本) | [English](#english-version)
</div>
---
<a id="中文版本"></a>
# Wearable_TimeSeries_Health_Monitor
面向可穿戴设备的多用户健康监控方案:一份模型、一个配置,就能为不同用户构建个性化异常检测。模型基于 **Phased LSTM + Temporal Fusion Transformer (TFT)**,并整合自适应基线、因子特征以及单位秒级的数据滑窗能力,适合当作 HuggingFace 模型或企业内部服务快速接入。
---
## 🌟 模型应用亮点
| 能力 | 说明 |
| --- | --- |
| **即插即用** | 内置 `WearableAnomalyDetector` 封装,加载模型即可预测,一次初始化后可持续监控多个用户 |
| **配置驱动特征** | `configs/features_config.json` 描述所有特征、缺省值、类别映射,新增/删减血氧、呼吸率等只需改配置 |
| **多用户实时服务** | `FeatureCalculator` + 轻量级 `data_storage` 缓存,实现用户历史管理、基线演化、批量推理 |
| **多场景 Demo** | `test_wearable_service.py` 内置 3 个真实"客户"案例:完整传感器、缺少字段、匿名设备,即使没有原始数据也能立即体验 |
| **自适应基线支持** | 可扩展 `UserDataManager` 将个人/分组基线接入推理流程,持续改善个体敏感度 |
---
## ⚡ 核心特点与技术优势
### 🎯 自适应基线:个人与群体智能融合
模型采用**自适应基线策略**,根据用户历史数据量动态选择最优基线:
- **个人基线优先**:当用户有足够历史数据(如 ≥7 天)时,使用个人 HRV 均值/标准差作为基线,捕捉个体生理节律差异
- **群体基线兜底**:新用户或数据稀疏时,自动切换到群体统计基线,确保冷启动也能稳定检测
- **平滑过渡机制**:通过加权混合(如 `final_mean = α × personal_mean + (1-α) × group_mean`)实现从群体到个人的渐进式适应
- **实时基线更新**:推理过程中持续累积用户数据,基线随用户状态演化而动态调整,提升长期监控精度
**优势**:相比固定阈值或纯群体基线,自适应基线能同时兼顾**个性化敏感度**(减少误报)和**冷启动鲁棒性**(新用户可用),特别适合多用户、长周期监控场景。
### ⏱️ 灵活的时间窗口与周期
- **5 分钟级粒度**:每条数据点代表 5 分钟聚合,支持秒级到小时级的灵活时间尺度
- **可配置窗口大小**:默认 12 点(1 小时),可根据业务需求调整为 6 点(30 分钟)或 24 点(2 小时)
- **不等间隔容错**:Phased LSTM 架构天然处理缺失数据点,即使数据稀疏(如夜间传感器断开)也能稳定推理
- **多时间尺度特征**:同时提取短期波动(RMSSD)、中期趋势(滑动均值)和长期模式(日/周周期),捕捉不同时间尺度的异常信号
**优势**:适应不同设备采样频率、用户佩戴习惯,无需强制对齐时间戳,降低数据预处理复杂度。
### 🔄 多通道数据协同作用
模型整合**4 大类特征通道**,通过因子特征与注意力机制实现跨通道信息融合:
1. **生理通道**(HR、HRV 系列、呼吸率、血氧)
- 直接反映心血管与呼吸系统状态
- 因子特征:`physiological_mean`, `physiological_std`, `physiological_max`, `physiological_min`
2. **活动通道**(步数、距离、能量消耗、加速度、陀螺仪)
- 捕捉运动强度与身体负荷
- 因子特征:`activity_mean`, `activity_std` 等
3. **环境通道**(光线、时间周期、数据质量)
- 提供上下文信息,区分运动性心率升高 vs 静息异常
- 类别特征:`time_period_primary`(morning/day/evening/night)
4. **基线通道**(自适应基线均值/标准差、偏差特征)
- 提供个性化参考基准,计算 `hrv_deviation_abs`, `hrv_z_score` 等相对异常指标
**协同机制**:
- **因子特征聚合**:将同类通道的统计量(均值/标准差/最值)作为高层特征,让模型学习通道间的关联模式
- **TFT 注意力**:Temporal Fusion Transformer 的变量选择网络自动识别哪些通道在特定时间点最重要
- **已知未来特征**:时间特征(小时、星期、是否周末)帮助模型理解周期性,区分正常波动与异常
**优势**:多通道协同能显著降低**单一指标误报**(如运动导致心率升高),提升**异常检测的上下文感知能力**,特别适合可穿戴设备的多传感器融合场景。
---
## 📊 核心指标(短期窗口)
- **F1**: 0.2819
- **Precision**: 0.1769
- **Recall**: 0.6941
- **最佳阈值**: 0.53
- **窗口定义**: 12 条 5 分钟数据(1小时时间窗,预测未来 0.5 小时)
> 模型偏向召回,适合“异常先提醒、人机协同复核”的场景。可通过阈值/采样策略调节精度与召回。
---
## 🚀 快速体验
### 1. 克隆或下载模型仓库
```bash
git clone https://huggingface.co/oscarzhang/Wearable_TimeSeries_Health_Monitor
cd Wearable_TimeSeries_Health_Monitor
pip install -r requirements.txt
```
### 2. 运行内置 Demo(无需额外数据)
```bash
# 默认跑 ab60 案例
python test_wearable_service.py
# 批量跑全部预置客户
python test_wearable_service.py --case all
# 想从原始 stage1 CSV 抽样测试
python test_wearable_service.py --from-raw
```
`test_wearable_service.py` 将自动:
- 加载 `WearableAnomalyDetector`
- 读取配置驱动特征
- 构建窗口并执行预测
- 输出每位“客户”的异常分数、阈值、预测详情
### 3. 在业务代码中调用
```python
from wearable_anomaly_detector import WearableAnomalyDetector
detector = WearableAnomalyDetector(
model_dir="checkpoints/phase2/exp_factor_balanced",
threshold=0.53,
)
result = detector.predict(data_points, return_score=True, return_details=True)
print(result)
```
> `data_points` 为 12 条最新的 5 分钟记录;若缺静态特征/设备信息,系统会自动从配置/缓存补齐。
---
## 🔧 输入与输出
### 输入(单个数据点)
```python
{
"timestamp": "2024-01-01T08:00:00",
"deviceId": "ab60", # 可选,缺失时会自动创建匿名 ID
"features": {
"hr": 72.0,
"hrv_rmssd": 30.0,
"time_period_primary": "morning",
"data_quality": "high",
...
}
}
```
- 每个窗口需 12 条数据(默认 1 小时)
- 特征是否必填由 `configs/features_config.json` 控制
- 缺失值会自动回落到 default 或 category_mapping 定义值
### 输出
```python
{
"is_anomaly": True,
"anomaly_score": 0.5760,
"threshold": 0.5300,
"details": {
"window_size": 12,
"model_output": 0.5760,
"prediction_confidence": 0.0460
}
}
```
---
## 🧱 模型架构与训练
- **模型骨干**:Phased LSTM 处理不等间隔序列 + Temporal Fusion Transformer 聚合时间上下文
- **异常检测头**:增强注意力、多层 MLP、可选对比学习/类型辅助头
- **特征体系**:
- 生理:HR、HRV(RMSSD/SDNN/PNN50…)
- 活动:步数、距离、能量消耗、加速度、陀螺仪
- 环境:光线、昼夜标签、数据质量
- 基线:自适应基线均值/标准差 + 偏差特征
- **标签来源**:问卷高置信度标签 + 自适应基线低置信度标签
- **训练流程**:Stage1/2/3 数据加工 ➜ Phase1 自监督预训练 ➜ Phase2 监督微调 ➜ 阈值/案例校正
---
## 📦 仓库结构(部分)
```
├─ configs/
│ └─ features_config.json # 特征定义 & 归一化策略
├─ wearable_anomaly_detector.py # 核心封装:加载、预测、批处理
├─ feature_calculator.py # 配置驱动的特征构建 + 用户历史缓存
├─ test_wearable_service.py # HuggingFace Demo脚本(内含预置案例)
└─ checkpoints/phase2/... # 模型权重 & summary
```
---
## 📚 数据来源与许可证
- 训练数据基于 **“A continuous real-world dataset comprising wearable-based heart rate variability alongside sleep diaries”**(Baigutanova *et al.*, Scientific Data, 2025)以及其 Figshare 数据集 [doi:10.1038/s41597-025-05801-3](https://www.nature.com/articles/s41597-025-05801-3) / [dataset link](https://springernature.figshare.com/articles/dataset/In-situ_wearable-based_dataset_of_continuous_heart_rate_variability_monitoring_accompanied_by_sleep_diaries/28509740)。
- 该数据集以 **Creative Commons Attribution 4.0 (CC BY 4.0)** 许可发布,可自由使用、修改、分发,但必须保留署名并附上许可证链接。
- 本仓库沿用 CC BY 4.0 对原始数据的要求;若你在此基础上再加工或发布,请继续保留上述署名与许可证说明。
- 代码/模型可根据需要使用 MIT/Apache 等许可证,但凡涉及数据的部分,仍需遵循 CC BY 4.0。
---
## 🤝 贡献与扩展
欢迎:
1. 新增特征或数据源 ⇒ 更新 `features_config.json` + 提交 PR
2. 接入新的用户数据管理/基线策略 ⇒ 扩展 `FeatureCalculator` 或贡献 `UserDataManager`
3. 反馈案例或真实部署经验 ⇒ 提 Issue 或 Discussion
---
## 📄 许可证
- **模型与代码**:Apache-2.0。可在保留版权与许可证声明的前提下任意使用/修改/分发。
- **训练数据**:原始可穿戴 HRV 数据集使用 CC BY 4.0,复用时请继续保留作者署名与许可信息。
---
## 🔖 引用
```bibtex
@software{Wearable_TimeSeries_Health_Monitor,
title = {Wearable\_TimeSeries\_Health\_Monitor},
author = {oscarzhang},
year = {2025},
url = {https://huggingface.co/oscarzhang/Wearable_TimeSeries_Health_Monitor}
}
```
---
<a id="english-version"></a>
# Wearable_TimeSeries_Health_Monitor
A multi-user health monitoring solution for wearable devices: one model, one configuration, enabling personalized anomaly detection for different users. The model is based on **Phased LSTM + Temporal Fusion Transformer (TFT)**, integrating adaptive baselines, factor features, and second-level data sliding window capabilities, suitable for deployment as a HuggingFace model or rapid integration into enterprise services.
---
## 🌟 Model Highlights
| Capability | Description |
| --- | --- |
| **Plug-and-Play** | Built-in `WearableAnomalyDetector` wrapper, load the model and start predicting, supports continuous monitoring of multiple users after a single initialization |
| **Configuration-Driven Features** | `configs/features_config.json` defines all features, default values, and category mappings; adding/removing features like blood oxygen or respiratory rate only requires configuration changes |
| **Multi-User Real-Time Service** | `FeatureCalculator` + lightweight `data_storage` cache enables user history management, baseline evolution, and batch inference |
| **Multi-Scenario Demo** | `test_wearable_service.py` includes 3 real "client" cases: complete sensors, missing fields, anonymous devices, allowing immediate experience even without raw data |
| **Adaptive Baseline Support** | Extensible `UserDataManager` integrates personal/group baselines into the inference pipeline, continuously improving individual sensitivity |
---
## ⚡ Core Features & Technical Advantages
### 🎯 Adaptive Baseline: Intelligent Fusion of Personal and Group
The model employs an **adaptive baseline strategy** that dynamically selects the optimal baseline based on user historical data volume:
- **Personal Baseline Priority**: When users have sufficient historical data (e.g., ≥7 days), use personal HRV mean/std as baseline to capture individual physiological rhythm differences
- **Group Baseline Fallback**: For new users or sparse data, automatically switch to group statistical baseline, ensuring stable detection even during cold start
- **Smooth Transition Mechanism**: Achieve gradual adaptation from group to personal through weighted mixing (e.g., `final_mean = α × personal_mean + (1-α) × group_mean`)
- **Real-Time Baseline Updates**: Continuously accumulate user data during inference, baseline dynamically adjusts as user state evolves, improving long-term monitoring accuracy
**Advantage**: Compared to fixed thresholds or pure group baselines, adaptive baselines balance **personalized sensitivity** (reducing false positives) and **cold-start robustness** (usable for new users), especially suitable for multi-user, long-term monitoring scenarios.
### ⏱️ Flexible Time Windows & Periods
- **5-Minute Granularity**: Each data point represents 5-minute aggregation, supporting flexible time scales from seconds to hours
- **Configurable Window Size**: Default 12 points (1 hour), adjustable to 6 points (30 minutes) or 24 points (2 hours) based on business needs
- **Uneven Interval Tolerance**: Phased LSTM architecture naturally handles missing data points, stable inference even with sparse data (e.g., sensor disconnection at night)
- **Multi-Time-Scale Features**: Simultaneously extract short-term fluctuations (RMSSD), medium-term trends (rolling mean), and long-term patterns (daily/weekly cycles), capturing anomaly signals at different time scales
**Advantage**: Adapts to different device sampling frequencies and user wearing habits, no need to force timestamp alignment, reducing data preprocessing complexity.
### 🔄 Multi-Channel Data Synergy
The model integrates **4 major feature channels**, achieving cross-channel information fusion through factor features and attention mechanisms:
1. **Physiological Channel** (HR, HRV series, respiratory rate, blood oxygen)
- Directly reflects cardiovascular and respiratory system status
- Factor features: `physiological_mean`, `physiological_std`, `physiological_max`, `physiological_min`
2. **Activity Channel** (steps, distance, energy consumption, acceleration, gyroscope)
- Captures exercise intensity and body load
- Factor features: `activity_mean`, `activity_std`, etc.
3. **Environmental Channel** (light, time period, data quality)
- Provides contextual information, distinguishing exercise-induced heart rate elevation vs. resting anomalies
- Categorical features: `time_period_primary` (morning/day/evening/night)
4. **Baseline Channel** (adaptive baseline mean/std, deviation features)
- Provides personalized reference baseline, calculating relative anomaly indicators like `hrv_deviation_abs`, `hrv_z_score`
**Synergy Mechanism**:
- **Factor Feature Aggregation**: Use statistical measures (mean/std/max/min) of similar channels as high-level features, enabling the model to learn association patterns between channels
- **TFT Attention**: Temporal Fusion Transformer's variable selection network automatically identifies which channels are most important at specific time points
- **Known Future Features**: Time features (hour, day of week, is_weekend) help the model understand periodicity, distinguishing normal fluctuations from anomalies
**Advantage**: Multi-channel synergy significantly reduces **single-indicator false positives** (e.g., exercise-induced heart rate elevation) and improves **context-aware anomaly detection**, especially suitable for multi-sensor fusion scenarios in wearable devices.
---
## 📊 Core Metrics (Short-Term Window)
- **F1**: 0.2819
- **Precision**: 0.1769
- **Recall**: 0.6941
- **Optimal Threshold**: 0.53
- **Window Definition**: 12 data points of 5-minute intervals (1-hour time window, predicting 0.5 hours ahead)
> The model favors recall, suitable for "anomaly-first alert, human-machine collaborative review" scenarios. Precision and recall can be adjusted through threshold/sampling strategies.
---
## 🚀 Quick Start
### 1. Clone or Download the Model Repository
```bash
git clone https://huggingface.co/oscarzhang/Wearable_TimeSeries_Health_Monitor
cd Wearable_TimeSeries_Health_Monitor
pip install -r requirements.txt
```
### 2. Run Built-in Demo (No Additional Data Required)
```bash
# Run ab60 case by default
python test_wearable_service.py
# Run all predefined clients
python test_wearable_service.py --case all
# Sample from raw stage1 CSV for testing
python test_wearable_service.py --from-raw
```
`test_wearable_service.py` will automatically:
- Load `WearableAnomalyDetector`
- Read configuration-driven features
- Build windows and execute predictions
- Output anomaly scores, thresholds, and prediction details for each "client"
### 3. Call in Business Code
```python
from wearable_anomaly_detector import WearableAnomalyDetector
detector = WearableAnomalyDetector(
model_dir="checkpoints/phase2/exp_factor_balanced",
threshold=0.53,
)
result = detector.predict(data_points, return_score=True, return_details=True)
print(result)
```
> `data_points` should be 12 latest 5-minute records; if static features/device information are missing, the system will automatically fill from configuration/cache.
---
## 🔧 Input & Output
### Input (Single Data Point)
```python
{
"timestamp": "2024-01-01T08:00:00",
"deviceId": "ab60", # Optional, anonymous ID will be created if missing
"features": {
"hr": 72.0,
"hrv_rmssd": 30.0,
"time_period_primary": "morning",
"data_quality": "high",
...
}
}
```
- Each window requires 12 data points (default 1 hour)
- Whether features are required is controlled by `configs/features_config.json`
- Missing values automatically fall back to default or category_mapping defined values
### Output
```python
{
"is_anomaly": True,
"anomaly_score": 0.5760,
"threshold": 0.5300,
"details": {
"window_size": 12,
"model_output": 0.5760,
"prediction_confidence": 0.0460
}
}
```
---
## 🧱 Model Architecture & Training
- **Model Backbone**: Phased LSTM handles unevenly-spaced sequences + Temporal Fusion Transformer aggregates temporal context
- **Anomaly Detection Head**: Enhanced attention, multi-layer MLP, optional contrastive learning/type auxiliary head
- **Feature System**:
- Physiological: HR, HRV (RMSSD/SDNN/PNN50…)
- Activity: Steps, distance, energy consumption, acceleration, gyroscope
- Environmental: Light, day/night labels, data quality
- Baseline: Adaptive baseline mean/std + deviation features
- **Label Source**: High-confidence questionnaire labels + low-confidence adaptive baseline labels
- **Training Pipeline**: Stage1/2/3 data processing ➜ Phase1 self-supervised pre-training ➜ Phase2 supervised fine-tuning ➜ Threshold/case calibration
---
## 📦 Repository Structure (Partial)
```
├─ configs/
│ └─ features_config.json # Feature definitions & normalization strategies
├─ wearable_anomaly_detector.py # Core wrapper: loading, prediction, batch processing
├─ feature_calculator.py # Configuration-driven feature construction + user history cache
├─ test_wearable_service.py # HuggingFace Demo script (includes predefined cases)
└─ checkpoints/phase2/... # Model weights & summary
```
---
## 📚 Data Source & License
- Training data is based on **"A continuous real-world dataset comprising wearable-based heart rate variability alongside sleep diaries"** (Baigutanova *et al.*, Scientific Data, 2025) and its Figshare dataset [doi:10.1038/s41597-025-05801-3](https://www.nature.com/articles/s41597-025-05801-3) / [dataset link](https://springernature.figshare.com/articles/dataset/In-situ_wearable-based_dataset_of_continuous_heart_rate_variability_monitoring_accompanied_by_sleep_diaries/28509740).
- This dataset is released under **Creative Commons Attribution 4.0 (CC BY 4.0)** license, allowing free use, modification, and distribution, but attribution and license link must be retained.
- This repository follows CC BY 4.0 requirements for original data; if you further process or publish based on this, please continue to retain the above attribution and license information.
- Code/models can use MIT/Apache or other licenses as needed, but any parts involving data must still follow CC BY 4.0.
---
## 🤝 Contributions & Extensions
Welcome to:
1. Add new features or data sources ⇒ Update `features_config.json` + submit PR
2. Integrate new user data management/baseline strategies ⇒ Extend `FeatureCalculator` or contribute `UserDataManager`
3. Provide feedback on cases or real deployment experiences ⇒ Open Issues or Discussions
---
## 📄 License
- **Model & Code**: Apache-2.0. Can be used/modified/distributed freely while retaining copyright and license notices.
- **Training Data**: Original wearable HRV dataset uses CC BY 4.0; please continue to retain author attribution and license information when reusing.
---
## 🔖 Citation
```bibtex
@software{Wearable_TimeSeries_Health_Monitor,
title = {Wearable\_TimeSeries\_Health\_Monitor},
author = {oscarzhang},
year = {2025},
url = {https://huggingface.co/oscarzhang/Wearable_TimeSeries_Health_Monitor}
}
```
|