Spaces:

mahboobalam0
/

hmt_ecgnet

Runtime error

App Files Files Community

mahboobalam0 commited on Feb 20

Commit

db3dfbd

verified ·

1 Parent(s): a5b6180

Upload 8 files

Browse files

Files changed (8) hide show

Dockerfile +22 -0
README.md +277 -10
app.py +257 -0
binary_threshold.json +3 -0
mi_best.pth +3 -0
multilabel_best.pth +3 -0
multilabel_thresholds.json +7 -0
requirements.txt +0 -0

Dockerfile ADDED Viewed

	@@ -0,0 +1,22 @@

+FROM python:3.10-slim
+WORKDIR /app
+# Install system dependencies
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    build-essential \
+    && rm -rf /var/lib/apt/lists/*
+# Copy requirements first for better caching
+COPY requirements.txt .
+RUN pip install --no-cache-dir -r requirements.txt
+# Copy project files
+COPY . .
+# Expose Streamlit default port (HF Spaces expects 7860)
+EXPOSE 7860
+HEALTHCHECK CMD curl --fail http://localhost:7860/_stcore/health
+ENTRYPOINT ["streamlit", "run", "app.py", "--server.port=7860", "--server.address=0.0.0.0", "--server.headless=true"]

README.md CHANGED Viewed

@@ -1,10 +1,277 @@
----
-title: Hmt Ecgnet
-emoji: 🦀
-colorFrom: green
-colorTo: blue
-sdk: docker
-pinned: false
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+---
+title: HMT-ECGNet
+colorFrom: red
+colorTo: blue
+sdk: docker
+app_port: 7860
+pinned: false
+license: mit
+---
+# HMT-ECGNet
+**Lightweight Hierarchical Multi-Lead ECG Classification on PTB-XL**
+---
+## Overview
+**HMT-ECGNet** is a **lightweight, hierarchical deep learning system** for automatic ECG interpretation, designed and evaluated on the **PTB-XL** dataset under **strict, leakage-free conditions**.
+The project demonstrates that **carefully designed, parameter-efficient neural architectures** can achieve **competitive diagnostic performance** compared to large CNNs (e.g., ResNet) while remaining **deployable in real-world clinical and edge environments**.
+This repository represents an **end-to-end ML system** — from data preprocessing and training to evaluation, inference API, and interactive visualization.
+---
+## Key Contributions
+- **Hierarchical multi-lead ECG modeling** (lead-wise → global aggregation)
+- **Sub-million parameter architecture** (~0.34M params)
+- **Strict PTB-XL official splits** (no patient leakage)
+- **Honest evaluation** (no test-set threshold tuning)
+- **End-to-end deployment demo** (FastAPI + Streamlit)
+- **Baseline comparison with ResNet**
+---
+## Problem Statement
+ECG classification is typically addressed using:
+- very large CNNs (10–60M parameters), or
+- Transformer-based architectures with heavy compute requirements.
+However, such models:
+- are difficult to deploy on **edge / wearable devices**,
+- often over-report performance due to **data leakage**,
+- ignore **realistic performance ceilings** caused by label ambiguity.
+> **Goal:**
+> Can a **lightweight, hierarchical neural network** achieve strong diagnostic performance on PTB-XL when evaluated correctly?
+---
+## Architecture: HMT-ECGNet
+### High-Level Design
+``` markdown
+├─ 12-Lead ECG (10s)
+│
+├─ Shared per-lead temporal encoder
+│
+├─ Lead-wise feature tokens
+│
+├─ Hierarchical cross-lead aggregation
+│
+├─ Global temporal pooling
+│
+└─ Classification head
+```
+![HMT-ECGNet Architecture](artifacts/hmt_ecgnet_architecture.jpg)
+### Design Principles
+- **Per-lead temporal modeling** with shared weights
+- **Hierarchical aggregation** instead of heavy attention
+- **Explicit separation of temporal and spatial modeling**
+- **Parameter efficiency first**, accuracy second
+**Total parameters:** ~**338K**
+---
+## Training Protocol
+- Optimizer: **AdamW**
+- Learning rate schedule: **Cosine Annealing**
+- Loss:
+  - Multi-label: `AsymmetricFocalLoss` with class balancing
+  - Binary: `BCEWithLogitsLoss`
+- Regularization:
+  - Signal preprocessing
+  - Early stopping
+- Reproducibility:
+  - Fixed random seeds
+  - Deterministic splits
+---
+## Results
+### Multi-Label Classification (Test Set)
+| Metric | HMT-ECGNet |
+|------|-----------|
+| AUROC (macro) | **≈ 0.92** |
+| AUPRC (macro) | ≈ 0.78 |
+| F1 (macro) | ≈ **0.73** |
+| Parameters | **0.34M** |
+---
+### Binary Classification — MI vs Normal (Test Set)
+| Metric | HMT-ECGNet |
+|------|-----------|
+| AUROC | **≈ 0.98** |
+| Accuracy | ≈ 0.92–0.93 |
+| F1 | ≈ **0.89** |
+**Observation:**
+Accuracy saturates due to ambiguous ECGs, while AUROC remains high — indicating strong class separability under realistic conditions.
+---
+## Baseline Comparison
+| Model | Params | AUROC (Multi) | F1 (Multi) |
+|------|--------|--------------|------------|
+| **ResNet-1D** | ~8.7M | ≈ 0.90 | ≈ 0.70 |
+| **HMT-ECGNet (ours)** | **0.34M** | **≈ 0.92** | **≈ 0.73** |
+**HMT-ECGNet outperforms ResNet while using ~25× fewer parameters**
+---
+## Deployment Demo
+Due to dataset licensing and size constraints, this project is not deployed as a public live demo.
+However, the **full inference and visualization pipeline is implemented and reproducible locally**.
+To launch the interactive ECG visualization and AI diagnosis interface:
+```bash
+streamlit run app.py
+```
+This repository includes a **production-style demo**:
+- **FastAPI** inference server
+- **Streamlit** UI
+  - Live ECG visualization
+  - Real-time predictions
+  - MI risk screening
+- Uses **unseen PTB-XL test ECGs**
+[![ECG Demo](artifacts/demo.png)](https://github.com/MahboobAlam0/hmt_ecg_healthmonitoringsystem/issues/1#issue-3938528989)
+---
+## Dataset
+### PTB-XL (PhysioNet, 2020)
+- ~21,800 ECG recordings
+- 12 leads
+- 10 seconds per ECG
+- Original sampling: 500 Hz (downsampled during preprocessing)
+- Official **patient-level splits**:
+  - Train: folds 1–8
+  - Validation: fold 9
+  - Test: fold 10
+### Tasks
+- **Multi-label classification (5 diagnostic superclasses)**
+  - NORM, MI, STTC, CD, HYP
+- **Binary classification**
+  - MI vs Normal
+  - Normal vs Abnormal
+⚠️ **Important:**
+All experiments strictly follow official PTB-XL splits.
+There is **no patient leakage**, **no test-set tuning**, and **no post-hoc threshold optimization**.
+---
+## Error Analysis & Insights
+- Ensemble models improve **stability**, not accuracy
+- Remaining errors are **systematic**, not variance-driven
+- Confirms a **performance ceiling** on PTB-XL due to:
+  - label ambiguity,
+  - inter-observer disagreement,
+  - borderline ECG patterns
+---
+## Project Structure
+```markdown
+├── hmt_ecgnet/
+├── artifacts/
+│   ├── mi_best.pth
+│   ├── multilabel_best.pth
+│   ├── multilabel_thresholds.json
+│   └── resnet_baseline.pth
+│
+├── .gitignore
+├── models/
+│   ├── hmt_ecgnet.py
+│   └── resnet1d.py
+│
+├── api.py
+├── app.py
+├── dataset.py
+├── train_multilabel.py
+├── train_binary.py
+├── eval_multilabel.py
+├── eval_binary.py
+├── threshold_search.py
+├── threshold_search_multilabel.py
+├── config.py
+└── README.md
+```
+---
+## References
+1. Wagner et al.
+   **PTB-XL: A Large Publicly Available Electrocardiography Dataset**
+   *PhysioNet, 2020*
+2. Ribeiro et al.
+   **Automatic diagnosis of the 12-lead ECG using deep neural networks**
+   *Nature Communications, 2020*
+3. Hannun et al.
+   **Cardiologist-Level Arrhythmia Detection with Deep Neural Networks**
+   *Nature Medicine, 2019*
+4. Rajpurkar et al.
+   **Cardiologist-Level Arrhythmia Detection Using Deep Neural Networks**
+   *arXiv:1707.01836*
+5. Tan & Le
+   **EfficientNet: Rethinking Model Scaling for CNNs**
+   *ICML, 2019*
+---
+## Disclaimer
+This system is **for research and demonstration purposes only**
+and **not intended for clinical diagnosis or treatment**.
+---
+## Author Note
+This project emphasizes:
+- **engineering discipline**
+- **honest evaluation**
+- **deployment realism**
+- and **model efficiency**
+rather than leaderboard chasing.

app.py ADDED Viewed

	@@ -0,0 +1,257 @@

+# app.py — Self-contained Streamlit ECG Diagnostic App (HF Spaces compatible)
+import os
+import json
+import numpy as np
+import torch
+import wfdb
+import pandas as pd
+import streamlit as st
+import matplotlib.pyplot as plt
+from scipy.signal import butter, filtfilt, resample
+# Local imports
+from models.hmt_ecgnet import HMT_ECGNet
+from transforms import preprocess_signal
+from config import N_LEADS
+# Constants
+DIAG_CLASSES = ["NORM", "MI", "STTC", "CD", "HYP"]
+MI_BINARY_THRESHOLD = 0.05
+LEAD_NAMES = [
+    "I", "II", "III", "aVR", "aVL", "aVF",
+    "V1", "V2", "V3", "V4", "V5", "V6",
+]
+FS_ORIG = 500
+FS_TARGET = 100
+DURATION_SEC = 10
+TARGET_LEN = FS_TARGET * DURATION_SEC
+DATA_DIR = os.path.join(os.path.dirname(os.path.abspath(__file__)), "EcgDataset")
+ARTIFACTS_DIR = os.path.join(os.path.dirname(os.path.abspath(__file__)), "artifacts")
+# Page Config
+st.set_page_config(
+    page_title="ECG AI Diagnostic System",
+    page_icon="",
+    layout="wide",
+)
+# MODEL LOADING (cached — runs once)
+@st.cache_resource(show_spinner="Loading AI model...")
+def load_model():
+    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+    model = HMT_ECGNet(num_classes=5, num_leads=N_LEADS).to(device)
+    ckpt_path = os.path.join(ARTIFACTS_DIR, "multilabel_best.pth")
+    ckpt = torch.load(ckpt_path, map_location=device, weights_only=False)
+    model.load_state_dict(ckpt["model_state_dict"])
+    model.eval()
+    return model, device
+@st.cache_data(show_spinner=False)
+def load_thresholds():
+    path = os.path.join(ARTIFACTS_DIR, "multilabel_thresholds.json")
+    with open(path) as f:
+        return json.load(f)
+#  DATA DOWNLOADING & LOADING
+@st.cache_data(show_spinner="Downloading PTB-XL sample data...")
+def download_ptbxl_data():
+    """Download PTB-XL database CSV + a subset of high-res records."""
+    os.makedirs(DATA_DIR, exist_ok=True)
+    csv_path = os.path.join(DATA_DIR, "ptbxl_database.csv")
+    if not os.path.exists(csv_path):
+        # Download the metadata CSV and SCP statements
+        wfdb.dl_database("ptb-xl", dl_dir=DATA_DIR, records="all", annotators=None)
+    return True
+@st.cache_data(show_spinner=False)
+def load_test_metadata():
+    csv_path = os.path.join(DATA_DIR, "ptbxl_database.csv")
+    df = pd.read_csv(csv_path)
+    df_test = df[df["strat_fold"] == 10].reset_index(drop=True)
+    return df_test
+@st.cache_data(show_spinner=False)
+def load_and_preprocess_ecg(filename):
+    """Load a single ECG record from PTB-XL and preprocess for display."""
+    filepath = os.path.join(DATA_DIR, filename)
+    # Download this specific record if not present
+    record_dir = os.path.dirname(filepath)
+    os.makedirs(record_dir, exist_ok=True)
+    if not os.path.exists(filepath + ".hea"):
+        # Download just this record
+        rel_path = filename.replace("\\", "/")
+        try:
+            wfdb.dl_database(
+                "ptb-xl",
+                dl_dir=DATA_DIR,
+                records=[rel_path],
+            )
+        except Exception:
+            st.error(f"Could not download record: {filename}")
+            return None
+    sig, _ = wfdb.rdsamp(filepath)
+    sig = sig.T  # (12, T)
+    # Bandpass filter for display
+    nyq = 0.5 * FS_ORIG
+    b, a = butter(4, [0.5 / nyq, 40.0 / nyq], btype="band")
+    for i in range(12):
+        sig[i] = filtfilt(b, a, sig[i])
+    # Resample for display
+    sig = resample(sig, TARGET_LEN, axis=1)
+    return sig.astype(np.float32)
+#  INFERENCE (runs directly — no FastAPI needed)
+def run_inference(ecg_display, model, device, thresholds):
+    """Run model inference on preprocessed ECG data."""
+    # Re-preprocess from display signal for model input
+    ecg_for_model = preprocess_signal(ecg_display.copy())
+    x = torch.tensor(ecg_for_model, dtype=torch.float32).unsqueeze(0).to(device)
+    with torch.no_grad():
+        probs = torch.sigmoid(model(x)).cpu().numpy()[0]
+    result = {}
+    predicted = []
+    for cls, p in zip(DIAG_CLASSES, probs):
+        thr = thresholds[cls]
+        result[cls] = float(p)
+        if p >= thr:
+            predicted.append(cls)
+    mi_prob = float(probs[1])
+    return {
+        "probabilities": result,
+        "predicted_classes": predicted,
+        "mi_probability": mi_prob,
+        "mi_risk": mi_prob >= MI_BINARY_THRESHOLD,
+    }
+#  ECG PLOTTING
+def plot_ecg(ecg):
+    """Plot full 12-lead ECG as a static figure."""
+    fig, axes = plt.subplots(12, 1, figsize=(24, 14), sharex=True)
+    x = np.arange(ecg.shape[1])
+    for i in range(12):
+        axes[i].plot(x, ecg[i], lw=1.1, color="#1f77b4")
+        axes[i].set_ylabel(
+            LEAD_NAMES[i],
+            rotation=0,
+            labelpad=28,
+            fontsize=10,
+        )
+        axes[i].grid(True, alpha=0.3)
+    axes[-1].set_xlabel("Time (samples)")
+    plt.tight_layout()
+    return fig
+#  MAIN APP
+st.title(" 12-Lead ECG AI Diagnostic System")
+st.markdown(
+    "**Live demo on unseen PTB-XL TEST ECGs**  \n"
+    "Lightweight hierarchical model • No data leakage • Realistic evaluation"
+)
+# Load model & data
+model, device = load_model()
+thresholds = load_thresholds()
+with st.spinner("Preparing PTB-XL test data..."):
+    download_ptbxl_data()
+    df_test = load_test_metadata()
+if len(df_test) == 0:
+    st.error("No test data found. Please check the PTB-XL dataset.")
+    st.stop()
+# Sidebar
+st.sidebar.header("ECG Sample Selector")
+sample_idx = st.sidebar.slider(
+    "Select ECG from TEST set",
+    0,
+    len(df_test) - 1,
+    0,
+)
+row = df_test.iloc[sample_idx]
+ecg = load_and_preprocess_ecg(row["filename_hr"])
+if ecg is None:
+    st.warning("Could not load this ECG record. Try another sample.")
+    st.stop()
+# ECG Display
+st.subheader(f"ECG Sample #{sample_idx}")
+fig = plot_ecg(ecg)
+st.pyplot(fig)
+plt.close(fig)
+# AI Inference
+st.subheader("AI Diagnosis")
+with st.spinner("Running inference..."):
+    result = run_inference(ecg, model, device, thresholds)
+# Results
+st.markdown("### Per-class Probabilities")
+cols = st.columns(5)
+for col, (cls, prob) in zip(cols, result["probabilities"].items()):
+    col.metric(cls, f"{prob:.3f}")
+st.markdown("### Final Predicted Classes")
+if result["predicted_classes"]:
+    st.error(", ".join(result["predicted_classes"]))
+else:
+    st.success("Normal ECG — No pathology detected")
+st.markdown("### Myocardial Infarction Screening")
+st.metric("MI Probability", f"{result['mi_probability']:.3f}")
+if result["mi_risk"]:
+    st.error("⚠️ High likelihood of Myocardial Infarction")
+else:
+    st.success(" No strong MI indication")
+# Footer
+st.markdown("---")
+st.caption(
+    "⚕️ **Disclaimer:** This system is for research and demonstration only. "
+    "Not intended for clinical diagnosis or treatment."
+)

binary_threshold.json ADDED Viewed

	@@ -0,0 +1,3 @@

+{
+    "mi_vs_norm": 0.05
+}

mi_best.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b1d554e14d3d5a81bbdd19b067f349d58c1eea5719915b44607345d2bd563ccb
+size 1368226

multilabel_best.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:598ffa81fcba7d94e1b229c6c81b4fe4479b53d01a15eac90e50348bff7a4cc7
+size 1369958

multilabel_thresholds.json ADDED Viewed

	@@ -0,0 +1,7 @@

+{
+  "NORM": 0.145,
+  "MI": 0.195,
+  "STTC": 0.8749999999999999,
+  "CD": 0.055,
+  "HYP": 0.31499999999999995
+}

requirements.txt ADDED Viewed

Binary file (534 Bytes). View file