Instructions to use henok007/hsi1_2_full with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Keras
How to use henok007/hsi1_2_full with Keras:
# Available backend options are: "jax", "torch", "tensorflow". import os os.environ["KERAS_BACKEND"] = "jax" import keras model = keras.saving.load_model("hf://henok007/hsi1_2_full") - Notebooks
- Google Colab
- Kaggle
Multi-Task Learning for Emotional and Cognitive State Detection
Simultaneously predict emotional state (stress) and cognitive state (mental effort) from physiological signals using a unified deep learning architecture.
π― Key Innovation: Multi-task framework achieving 70.7% accuracy with multimodal fusion (+8.3% over IBI-only) and principled data preparation (+3.7% over naive training).
Why Multi-Task Learning?
Traditional approaches train separate models for stress and effort detection. Our unified multi-task framework offers:
- Joint representation learning: Shared encoders extract features relevant to both emotional and cognitive states
- Efficient architecture: Single model (~87k parameters) predicts both dimensions simultaneously
- Task-specific training: Principled data preparation (masking ambiguous samples) improves generalization
- Real-time inference: One forward pass β both stress and effort predictions
Quick Start
# Install
git clone https://github.com/yisakSyn/SynheartFocus_MultitaskModel.git
cd SynheartFocus_MultitaskModel
pip install -r requirements.txt
# Prepare data (SWELL-KW dataset)
python prepare_multimodal_data_v5.py
# Train with LOSO cross-validation
python trainer_v8_masked_effortMultimodal.py --loso --data_dir ./prepared_data_v5_multimodal
# Single holdout evaluation
python trainer_v8_masked_effortMultimodal.py --holdout pp09 pp17 pp25
Architecture
Physiological Inputs (4 streams)
ββ IBI timeseries (120 samples @ 2Hz = 60s) βββ
ββ HRV features (14 features) βββ IBI Encoder (CNN-LSTM) β Dense(32)
ββ EDA timeseries (120 samples @ 2Hz = 60s) βββ€ β
ββ EDA features (12 features) βββ EDA Encoder (CNN-LSTM) β Dense(32)
β
Fusion: Concat(32+16+32+16) = 96d
β
Dense(64) β Dense(32)
β
ββββββββββββββββββββββββ΄βββββββββββββββββββββββ
Stress Head Effort Head
Dense(2, softmax) Dense(2, softmax)
(trained on all samples) (trained on valid only)
Model Details
IBI Encoder (CNN-LSTM):
- Conv1D blocks: 24β32β48 filters (kernels 7β5β3)
- LSTM(32) with attention pooling
- HRV feature branch: Dense(24β16)
- Output: 48 dimensions (32+16)
EDA Encoder (CNN-LSTM):
- Conv1D blocks: 24β32β48 filters (kernels 9β7β5, larger for EDA)
- LSTM(32) with attention pooling
- EDA feature branch: Dense(24β16)
- Output: 48 dimensions (32+16)
Fusion & Classification:
- Concatenate: [IBI(32) + HRV(16) + EDA(32) + EDA_feat(16)] = 96 dims
- Dense layers: 64 β 32 with dropout (0.5, 0.25)
- Two task-specific output heads (softmax)
Joint Optimization:
# Loss computation
loss_stress = focal_loss(y_true_stress, y_pred_stress) # All samples
loss_effort = masked_focal_loss(y_true_effort, y_pred_effort, mask) # Valid only
total_loss = loss_stress + loss_effort
Parameters: ~87,000 total
Results
LOSO Cross-Validation (18 subjects)
| Metric | Accuracy | Std Dev |
|---|---|---|
| Stress Detection | 68.9% | Β±12.9% |
| Effort Detection | 72.5% | Β±14.1% |
| Average | 70.7% | Β±13.2% |
Ablation Studies
1. Multimodal Fusion:
| Configuration | Stress | Effort | Average |
|---|---|---|---|
| IBI only (Run 5) | 60.2% | 64.6% | 62.4% |
| IBI + EDA (ours, Run 3) | 68.9% | 72.5% | 70.7% |
| Improvement | +8.7% | +7.9% | +8.3% |
Key Finding: Multimodal fusion (IBI + EDA) provides substantial improvement over single-modality IBI-only approach.
2. Task-Specific Data Preparation:
| Training Strategy | Stress | Effort | Average |
|---|---|---|---|
| All conditions (c1, c2, c3) | 68.7% | 65.2% | 67.0% |
| Task-specific masking (ours) | 68.9% | 72.5% | 70.7% |
| Improvement | +0.2% | +7.3%* | +3.7% |
*p < 0.01 (McNemar's test)
Key Finding: Masking ambiguous effort labels (c2 interruption condition) improves effort detection by 7.3% while maintaining stress accuracy.
Key Contributions Validated
Our experimental results validate two independent contributions:
- Multimodal Fusion (+8.3%): Combining IBI and EDA signals outperforms IBI-only approach
- Task-Specific Data Preparation (+3.7%): Masking ambiguous samples improves effort detection
- Combined Framework (70.7%): Multi-task architecture with principled data curation
Methodology
Multi-Task Learning Framework
Core Design: Unified architecture with shared encoders and task-specific output heads enables joint optimization of both emotional and cognitive state prediction.
Loss Function:
L_total = L_stress + L_effort
where:
L_stress = FocalLoss(y_stress, pred_stress) # All samples
L_effort = MaskedFocalLoss(y_effort, pred_effort, mask) # Valid samples only
Task-Specific Data Preparation
| Condition | Stress Label | Effort Label | Training Strategy |
|---|---|---|---|
| c1 (Neutral) | 0 (Low) | 0 (Low) | Both tasks trained |
| c2 (Interruption) | 1 (High) | MASKED | Stress only |
| c3 (Time Pressure) | 1 (High) | 1 (High) | Both tasks trained |
Rationale: Interruption-based tasks (c2) produce ambiguous effort labels due to individual differences in multitasking ability. NASA-TLX analysis shows 2.2Γ higher variance in effort ratings for c2 (CV=0.44) vs. c3 (CV=0.20).
Implementation:
- Stress head: Trained on all 11,332 samples
- Effort head: Trained on 7,998 valid samples (c1 + c3 only)
- Masking applied during loss computation, not data filtering
Signal Processing
IBI Extraction:
- R-peak detection from ECG (2048 Hz)
- Artifact correction (physiological constraints: 300-2000ms)
- Cubic spline interpolation to 2 Hz
- 120-sample windows (60s at 2 Hz) with 75% overlap
HRV Features (14):
- Time-domain (7): Mean IBI, SDNN, RMSSD, pNN50, CV, Mean HR, Std HR
- Frequency-domain (4): LF, HF, LF/HF, Total power
- Nonlinear (3): SD1, SD2, SD1/SD2
EDA Processing:
- Detrending and lowpass filter (1 Hz cutoff)
- Downsample to 2 Hz
- 120-sample windows with 75% overlap
EDA Features (12):
- Raw statistics (4): Mean, SD, Min, Max
- Tonic component (4): Mean, SD, slope, range
- Phasic component (4): Mean amplitude, SD, SCR count, Mean SCR height
Project Structure
multitask-stress-effort/
βββ prepare_multimodal_data_v5.py # Raw signals β ML features
βββ trainer_v8_masked_effortMultimodal.py # Multi-task training
βββ Questionnaire.xlsx # Labels from SWELL-KW
βββ Processed_IBI_HR_EDA/ # MATLAB-processed signals
β βββ ppXX_date_cY_IBI_HR_EDA.mat
βββ prepared_data_v5_multimodal/ # ML-ready dataset
β βββ X_ibi.npy # (N, 120, 1)
β βββ X_hrv.npy # (N, 14)
β βββ X_eda.npy # (N, 120, 1)
β βββ X_eda_features.npy # (N, 12)
β βββ y_stress.npy # (N, 2) one-hot
β βββ y_effort.npy # (N, 2) one-hot, masked=[0,0]
β βββ effort_mask.npy # (N,) 1=valid, 0=masked
β βββ ...
βββ runs/ # Training results
βββ synheart_v8_multimodal_*/
Training Configuration
| Parameter | Value |
|---|---|
| Optimizer | AdamW |
| Learning Rate | 2Γ10β»β΄ |
| Weight Decay | 1Γ10β»Β³ |
| Batch Size | 64 |
| Max Epochs | 200 |
| Early Stopping | 25 epochs patience |
| Loss Function | Focal Loss (Ξ³=1.5) + Label Smoothing (Ξ΅=0.05) |
| Dropout | 0.35 (encoders), 0.50 (fusion) |
| Augmentation | Time Masking (p=0.5, width=15) |
Citation
@article{author2025multitask,
title={Simultaneous Detection of Emotional and Cognitive States:
A Multi-Task Deep Learning Approach},
author={[Your Name]},
journal={IEEE Transactions on Affective Computing},
year={2025},
note={Under review}
}
Requirements
tensorflow>=2.10.0
numpy>=1.21.0
scipy>=1.7.0
scikit-learn>=1.0.0
matplotlib>=3.5.0
pandas>=1.3.0
openpyxl>=3.0.0 # For Excel label files
Full list: requirements.txt
Key References
Multi-Task Learning:
- Caruana (1997) - "Multitask Learning"
- Ruder (2017) - "Multi-Task Learning in Deep Neural Networks"
Dataset & Cognitive Load:
- Koldijk et al. (2014) - "SWELL Knowledge Work Dataset"
- Monsell (2003) - "Task Switching"
- Hockey (1997) - "Compensatory Control Theory"
License
MIT License - see LICENSE for details.
Contact
[Yisak T] - [yisak@synheart.ai] PhD
Multi-task framework for joint emotional and cognitive state detection from physiological signals
70.7% LOSO accuracy | Multimodal IBI+EDA fusion | Task-specific training
