--- library_name: pytorch pipeline_tag: time-series-forecasting language: - zh - en tags: - anomaly-detection - time-series - wearable - health - lstm - transformer - physiological-monitoring - hrv - heart-rate - real-time - multi-user - personalized - sensor-fusion - healthcare - continuous-monitoring license: apache-2.0 pretty_name: Wearable TimeSeries Health Monitor ---
**Language / 语言**: [中文](#中文版本) | [English](#english-version)
--- # Wearable_TimeSeries_Health_Monitor 面向可穿戴设备的多用户健康监控方案:一份模型、一个配置,就能为不同用户构建个性化异常检测。模型基于 **Phased LSTM + Temporal Fusion Transformer (TFT)**,并整合自适应基线、因子特征以及单位秒级的数据滑窗能力,适合当作 HuggingFace 模型或企业内部服务快速接入。 --- ## 🌟 模型应用亮点 | 能力 | 说明 | | --- | --- | | **即插即用** | 内置 `WearableAnomalyDetector` 封装,加载模型即可预测,一次初始化后可持续监控多个用户 | | **配置驱动特征** | `configs/features_config.json` 描述所有特征、缺省值、类别映射,新增/删减血氧、呼吸率等只需改配置 | | **多用户实时服务** | `FeatureCalculator` + 轻量级 `data_storage` 缓存,实现用户历史管理、基线演化、批量推理 | | **多场景 Demo** | `test_wearable_service.py` 内置 3 个真实"客户"案例:完整传感器、缺少字段、匿名设备,即使没有原始数据也能立即体验 | | **自适应基线支持** | 可扩展 `UserDataManager` 将个人/分组基线接入推理流程,持续改善个体敏感度 | --- ## ⚡ 核心特点与技术优势 ### 🎯 自适应基线:个人与群体智能融合 模型采用**自适应基线策略**,根据用户历史数据量动态选择最优基线: - **个人基线优先**:当用户有足够历史数据(如 ≥7 天)时,使用个人 HRV 均值/标准差作为基线,捕捉个体生理节律差异 - **群体基线兜底**:新用户或数据稀疏时,自动切换到群体统计基线,确保冷启动也能稳定检测 - **平滑过渡机制**:通过加权混合(如 `final_mean = α × personal_mean + (1-α) × group_mean`)实现从群体到个人的渐进式适应 - **实时基线更新**:推理过程中持续累积用户数据,基线随用户状态演化而动态调整,提升长期监控精度 **优势**:相比固定阈值或纯群体基线,自适应基线能同时兼顾**个性化敏感度**(减少误报)和**冷启动鲁棒性**(新用户可用),特别适合多用户、长周期监控场景。 ### ⏱️ 灵活的时间窗口与周期 - **5 分钟级粒度**:每条数据点代表 5 分钟聚合,支持秒级到小时级的灵活时间尺度 - **可配置窗口大小**:默认 12 点(1 小时),可根据业务需求调整为 6 点(30 分钟)或 24 点(2 小时) - **不等间隔容错**:Phased LSTM 架构天然处理缺失数据点,即使数据稀疏(如夜间传感器断开)也能稳定推理 - **多时间尺度特征**:同时提取短期波动(RMSSD)、中期趋势(滑动均值)和长期模式(日/周周期),捕捉不同时间尺度的异常信号 **优势**:适应不同设备采样频率、用户佩戴习惯,无需强制对齐时间戳,降低数据预处理复杂度。 ### 🔄 多通道数据协同作用 模型整合**4 大类特征通道**,通过因子特征与注意力机制实现跨通道信息融合: 1. **生理通道**(HR、HRV 系列、呼吸率、血氧) - 直接反映心血管与呼吸系统状态 - 因子特征:`physiological_mean`, `physiological_std`, `physiological_max`, `physiological_min` 2. **活动通道**(步数、距离、能量消耗、加速度、陀螺仪) - 捕捉运动强度与身体负荷 - 因子特征:`activity_mean`, `activity_std` 等 3. **环境通道**(光线、时间周期、数据质量) - 提供上下文信息,区分运动性心率升高 vs 静息异常 - 类别特征:`time_period_primary`(morning/day/evening/night) 4. **基线通道**(自适应基线均值/标准差、偏差特征) - 提供个性化参考基准,计算 `hrv_deviation_abs`, `hrv_z_score` 等相对异常指标 **协同机制**: - **因子特征聚合**:将同类通道的统计量(均值/标准差/最值)作为高层特征,让模型学习通道间的关联模式 - **TFT 注意力**:Temporal Fusion Transformer 的变量选择网络自动识别哪些通道在特定时间点最重要 - **已知未来特征**:时间特征(小时、星期、是否周末)帮助模型理解周期性,区分正常波动与异常 **优势**:多通道协同能显著降低**单一指标误报**(如运动导致心率升高),提升**异常检测的上下文感知能力**,特别适合可穿戴设备的多传感器融合场景。 --- ## 📊 核心指标(短期窗口) - **F1**: 0.2819 - **Precision**: 0.1769 - **Recall**: 0.6941 - **最佳阈值**: 0.53 - **窗口定义**: 12 条 5 分钟数据(1小时时间窗,预测未来 0.5 小时) > 模型偏向召回,适合“异常先提醒、人机协同复核”的场景。可通过阈值/采样策略调节精度与召回。 --- ## 🚀 快速体验 ### 1. 克隆或下载模型仓库 ```bash git clone https://huggingface.co/oscarzhang/Wearable_TimeSeries_Health_Monitor cd Wearable_TimeSeries_Health_Monitor pip install -r requirements.txt ``` ### 2. 运行内置 Demo(无需额外数据) ```bash # 默认跑 ab60 案例 python test_wearable_service.py # 批量跑全部预置客户 python test_wearable_service.py --case all # 想从原始 stage1 CSV 抽样测试 python test_wearable_service.py --from-raw ``` `test_wearable_service.py` 将自动: - 加载 `WearableAnomalyDetector` - 读取配置驱动特征 - 构建窗口并执行预测 - 输出每位“客户”的异常分数、阈值、预测详情 ### 3. 在业务代码中调用 ```python from wearable_anomaly_detector import WearableAnomalyDetector detector = WearableAnomalyDetector( model_dir="checkpoints/phase2/exp_factor_balanced", threshold=0.53, ) result = detector.predict(data_points, return_score=True, return_details=True) print(result) ``` > `data_points` 为 12 条最新的 5 分钟记录;若缺静态特征/设备信息,系统会自动从配置/缓存补齐。 --- ## 🔧 输入与输出 ### 输入(单个数据点) ```python { "timestamp": "2024-01-01T08:00:00", "deviceId": "ab60", # 可选,缺失时会自动创建匿名 ID "features": { "hr": 72.0, "hrv_rmssd": 30.0, "time_period_primary": "morning", "data_quality": "high", ... } } ``` - 每个窗口需 12 条数据(默认 1 小时) - 特征是否必填由 `configs/features_config.json` 控制 - 缺失值会自动回落到 default 或 category_mapping 定义值 ### 输出 ```python { "is_anomaly": True, "anomaly_score": 0.5760, "threshold": 0.5300, "details": { "window_size": 12, "model_output": 0.5760, "prediction_confidence": 0.0460 } } ``` --- ## 🧱 模型架构与训练 - **模型骨干**:Phased LSTM 处理不等间隔序列 + Temporal Fusion Transformer 聚合时间上下文 - **异常检测头**:增强注意力、多层 MLP、可选对比学习/类型辅助头 - **特征体系**: - 生理:HR、HRV(RMSSD/SDNN/PNN50…) - 活动:步数、距离、能量消耗、加速度、陀螺仪 - 环境:光线、昼夜标签、数据质量 - 基线:自适应基线均值/标准差 + 偏差特征 - **标签来源**:问卷高置信度标签 + 自适应基线低置信度标签 - **训练流程**:Stage1/2/3 数据加工 ➜ Phase1 自监督预训练 ➜ Phase2 监督微调 ➜ 阈值/案例校正 --- ## 📦 仓库结构(部分) ``` ├─ configs/ │ └─ features_config.json # 特征定义 & 归一化策略 ├─ wearable_anomaly_detector.py # 核心封装:加载、预测、批处理 ├─ feature_calculator.py # 配置驱动的特征构建 + 用户历史缓存 ├─ test_wearable_service.py # HuggingFace Demo脚本(内含预置案例) └─ checkpoints/phase2/... # 模型权重 & summary ``` --- ## 📚 数据来源与许可证 - 训练数据基于 **“A continuous real-world dataset comprising wearable-based heart rate variability alongside sleep diaries”**(Baigutanova *et al.*, Scientific Data, 2025)以及其 Figshare 数据集 [doi:10.1038/s41597-025-05801-3](https://www.nature.com/articles/s41597-025-05801-3) / [dataset link](https://springernature.figshare.com/articles/dataset/In-situ_wearable-based_dataset_of_continuous_heart_rate_variability_monitoring_accompanied_by_sleep_diaries/28509740)。 - 该数据集以 **Creative Commons Attribution 4.0 (CC BY 4.0)** 许可发布,可自由使用、修改、分发,但必须保留署名并附上许可证链接。 - 本仓库沿用 CC BY 4.0 对原始数据的要求;若你在此基础上再加工或发布,请继续保留上述署名与许可证说明。 - 代码/模型可根据需要使用 MIT/Apache 等许可证,但凡涉及数据的部分,仍需遵循 CC BY 4.0。 --- ## 🤝 贡献与扩展 欢迎: 1. 新增特征或数据源 ⇒ 更新 `features_config.json` + 提交 PR 2. 接入新的用户数据管理/基线策略 ⇒ 扩展 `FeatureCalculator` 或贡献 `UserDataManager` 3. 反馈案例或真实部署经验 ⇒ 提 Issue 或 Discussion --- ## 📄 许可证 - **模型与代码**:Apache-2.0。可在保留版权与许可证声明的前提下任意使用/修改/分发。 - **训练数据**:原始可穿戴 HRV 数据集使用 CC BY 4.0,复用时请继续保留作者署名与许可信息。 --- ## 🔖 引用 ```bibtex @software{Wearable_TimeSeries_Health_Monitor, title = {Wearable\_TimeSeries\_Health\_Monitor}, author = {oscarzhang}, year = {2025}, url = {https://huggingface.co/oscarzhang/Wearable_TimeSeries_Health_Monitor} } ``` --- # Wearable_TimeSeries_Health_Monitor A multi-user health monitoring solution for wearable devices: one model, one configuration, enabling personalized anomaly detection for different users. The model is based on **Phased LSTM + Temporal Fusion Transformer (TFT)**, integrating adaptive baselines, factor features, and second-level data sliding window capabilities, suitable for deployment as a HuggingFace model or rapid integration into enterprise services. --- ## 🌟 Model Highlights | Capability | Description | | --- | --- | | **Plug-and-Play** | Built-in `WearableAnomalyDetector` wrapper, load the model and start predicting, supports continuous monitoring of multiple users after a single initialization | | **Configuration-Driven Features** | `configs/features_config.json` defines all features, default values, and category mappings; adding/removing features like blood oxygen or respiratory rate only requires configuration changes | | **Multi-User Real-Time Service** | `FeatureCalculator` + lightweight `data_storage` cache enables user history management, baseline evolution, and batch inference | | **Multi-Scenario Demo** | `test_wearable_service.py` includes 3 real "client" cases: complete sensors, missing fields, anonymous devices, allowing immediate experience even without raw data | | **Adaptive Baseline Support** | Extensible `UserDataManager` integrates personal/group baselines into the inference pipeline, continuously improving individual sensitivity | --- ## ⚡ Core Features & Technical Advantages ### 🎯 Adaptive Baseline: Intelligent Fusion of Personal and Group The model employs an **adaptive baseline strategy** that dynamically selects the optimal baseline based on user historical data volume: - **Personal Baseline Priority**: When users have sufficient historical data (e.g., ≥7 days), use personal HRV mean/std as baseline to capture individual physiological rhythm differences - **Group Baseline Fallback**: For new users or sparse data, automatically switch to group statistical baseline, ensuring stable detection even during cold start - **Smooth Transition Mechanism**: Achieve gradual adaptation from group to personal through weighted mixing (e.g., `final_mean = α × personal_mean + (1-α) × group_mean`) - **Real-Time Baseline Updates**: Continuously accumulate user data during inference, baseline dynamically adjusts as user state evolves, improving long-term monitoring accuracy **Advantage**: Compared to fixed thresholds or pure group baselines, adaptive baselines balance **personalized sensitivity** (reducing false positives) and **cold-start robustness** (usable for new users), especially suitable for multi-user, long-term monitoring scenarios. ### ⏱️ Flexible Time Windows & Periods - **5-Minute Granularity**: Each data point represents 5-minute aggregation, supporting flexible time scales from seconds to hours - **Configurable Window Size**: Default 12 points (1 hour), adjustable to 6 points (30 minutes) or 24 points (2 hours) based on business needs - **Uneven Interval Tolerance**: Phased LSTM architecture naturally handles missing data points, stable inference even with sparse data (e.g., sensor disconnection at night) - **Multi-Time-Scale Features**: Simultaneously extract short-term fluctuations (RMSSD), medium-term trends (rolling mean), and long-term patterns (daily/weekly cycles), capturing anomaly signals at different time scales **Advantage**: Adapts to different device sampling frequencies and user wearing habits, no need to force timestamp alignment, reducing data preprocessing complexity. ### 🔄 Multi-Channel Data Synergy The model integrates **4 major feature channels**, achieving cross-channel information fusion through factor features and attention mechanisms: 1. **Physiological Channel** (HR, HRV series, respiratory rate, blood oxygen) - Directly reflects cardiovascular and respiratory system status - Factor features: `physiological_mean`, `physiological_std`, `physiological_max`, `physiological_min` 2. **Activity Channel** (steps, distance, energy consumption, acceleration, gyroscope) - Captures exercise intensity and body load - Factor features: `activity_mean`, `activity_std`, etc. 3. **Environmental Channel** (light, time period, data quality) - Provides contextual information, distinguishing exercise-induced heart rate elevation vs. resting anomalies - Categorical features: `time_period_primary` (morning/day/evening/night) 4. **Baseline Channel** (adaptive baseline mean/std, deviation features) - Provides personalized reference baseline, calculating relative anomaly indicators like `hrv_deviation_abs`, `hrv_z_score` **Synergy Mechanism**: - **Factor Feature Aggregation**: Use statistical measures (mean/std/max/min) of similar channels as high-level features, enabling the model to learn association patterns between channels - **TFT Attention**: Temporal Fusion Transformer's variable selection network automatically identifies which channels are most important at specific time points - **Known Future Features**: Time features (hour, day of week, is_weekend) help the model understand periodicity, distinguishing normal fluctuations from anomalies **Advantage**: Multi-channel synergy significantly reduces **single-indicator false positives** (e.g., exercise-induced heart rate elevation) and improves **context-aware anomaly detection**, especially suitable for multi-sensor fusion scenarios in wearable devices. --- ## 📊 Core Metrics (Short-Term Window) - **F1**: 0.2819 - **Precision**: 0.1769 - **Recall**: 0.6941 - **Optimal Threshold**: 0.53 - **Window Definition**: 12 data points of 5-minute intervals (1-hour time window, predicting 0.5 hours ahead) > The model favors recall, suitable for "anomaly-first alert, human-machine collaborative review" scenarios. Precision and recall can be adjusted through threshold/sampling strategies. --- ## 🚀 Quick Start ### 1. Clone or Download the Model Repository ```bash git clone https://huggingface.co/oscarzhang/Wearable_TimeSeries_Health_Monitor cd Wearable_TimeSeries_Health_Monitor pip install -r requirements.txt ``` ### 2. Run Built-in Demo (No Additional Data Required) ```bash # Run ab60 case by default python test_wearable_service.py # Run all predefined clients python test_wearable_service.py --case all # Sample from raw stage1 CSV for testing python test_wearable_service.py --from-raw ``` `test_wearable_service.py` will automatically: - Load `WearableAnomalyDetector` - Read configuration-driven features - Build windows and execute predictions - Output anomaly scores, thresholds, and prediction details for each "client" ### 3. Call in Business Code ```python from wearable_anomaly_detector import WearableAnomalyDetector detector = WearableAnomalyDetector( model_dir="checkpoints/phase2/exp_factor_balanced", threshold=0.53, ) result = detector.predict(data_points, return_score=True, return_details=True) print(result) ``` > `data_points` should be 12 latest 5-minute records; if static features/device information are missing, the system will automatically fill from configuration/cache. --- ## 🔧 Input & Output ### Input (Single Data Point) ```python { "timestamp": "2024-01-01T08:00:00", "deviceId": "ab60", # Optional, anonymous ID will be created if missing "features": { "hr": 72.0, "hrv_rmssd": 30.0, "time_period_primary": "morning", "data_quality": "high", ... } } ``` - Each window requires 12 data points (default 1 hour) - Whether features are required is controlled by `configs/features_config.json` - Missing values automatically fall back to default or category_mapping defined values ### Output ```python { "is_anomaly": True, "anomaly_score": 0.5760, "threshold": 0.5300, "details": { "window_size": 12, "model_output": 0.5760, "prediction_confidence": 0.0460 } } ``` --- ## 🧱 Model Architecture & Training - **Model Backbone**: Phased LSTM handles unevenly-spaced sequences + Temporal Fusion Transformer aggregates temporal context - **Anomaly Detection Head**: Enhanced attention, multi-layer MLP, optional contrastive learning/type auxiliary head - **Feature System**: - Physiological: HR, HRV (RMSSD/SDNN/PNN50…) - Activity: Steps, distance, energy consumption, acceleration, gyroscope - Environmental: Light, day/night labels, data quality - Baseline: Adaptive baseline mean/std + deviation features - **Label Source**: High-confidence questionnaire labels + low-confidence adaptive baseline labels - **Training Pipeline**: Stage1/2/3 data processing ➜ Phase1 self-supervised pre-training ➜ Phase2 supervised fine-tuning ➜ Threshold/case calibration --- ## 📦 Repository Structure (Partial) ``` ├─ configs/ │ └─ features_config.json # Feature definitions & normalization strategies ├─ wearable_anomaly_detector.py # Core wrapper: loading, prediction, batch processing ├─ feature_calculator.py # Configuration-driven feature construction + user history cache ├─ test_wearable_service.py # HuggingFace Demo script (includes predefined cases) └─ checkpoints/phase2/... # Model weights & summary ``` --- ## 📚 Data Source & License - Training data is based on **"A continuous real-world dataset comprising wearable-based heart rate variability alongside sleep diaries"** (Baigutanova *et al.*, Scientific Data, 2025) and its Figshare dataset [doi:10.1038/s41597-025-05801-3](https://www.nature.com/articles/s41597-025-05801-3) / [dataset link](https://springernature.figshare.com/articles/dataset/In-situ_wearable-based_dataset_of_continuous_heart_rate_variability_monitoring_accompanied_by_sleep_diaries/28509740). - This dataset is released under **Creative Commons Attribution 4.0 (CC BY 4.0)** license, allowing free use, modification, and distribution, but attribution and license link must be retained. - This repository follows CC BY 4.0 requirements for original data; if you further process or publish based on this, please continue to retain the above attribution and license information. - Code/models can use MIT/Apache or other licenses as needed, but any parts involving data must still follow CC BY 4.0. --- ## 🤝 Contributions & Extensions Welcome to: 1. Add new features or data sources ⇒ Update `features_config.json` + submit PR 2. Integrate new user data management/baseline strategies ⇒ Extend `FeatureCalculator` or contribute `UserDataManager` 3. Provide feedback on cases or real deployment experiences ⇒ Open Issues or Discussions --- ## 📄 License - **Model & Code**: Apache-2.0. Can be used/modified/distributed freely while retaining copyright and license notices. - **Training Data**: Original wearable HRV dataset uses CC BY 4.0; please continue to retain author attribution and license information when reusing. --- ## 🔖 Citation ```bibtex @software{Wearable_TimeSeries_Health_Monitor, title = {Wearable\_TimeSeries\_Health\_Monitor}, author = {oscarzhang}, year = {2025}, url = {https://huggingface.co/oscarzhang/Wearable_TimeSeries_Health_Monitor} } ```