Spaces:

P2SAMAPA
/

P2-ETF-CNN-LSTM-ALTERNATIVE-APPROACHES

Running

App Files Files Community

GitHub Actions commited on Feb 20

Commit

7e96b08

1 Parent(s): 6d4734e

Sync from GitHub: 6ed9981d3cbd810f18c7f5d897bdfcb3420a9091

Browse files

Files changed (8) hide show

README.md +110 -14
app.py +273 -0
hf_space/.gitattributes +35 -0
hf_space/Dockerfile +20 -0
hf_space/README.md +19 -0
hf_space/requirements.txt +3 -0
hf_space/src/streamlit_app.py +40 -0
requirements.txt +29 -3

README.md CHANGED Viewed

@@ -1,19 +1,115 @@
 ---
-title: P2 ETF CNN LSTM ALTERNATIVE APPROACHES
-emoji: 🚀
-colorFrom: red
-colorTo: red
-sdk: docker
-app_port: 8501
-tags:
-- streamlit
-pinned: false
-short_description: Streamlit template space
 ---
-# Welcome to Streamlit!
-Edit `/src/streamlit_app.py` to customize this app to your heart's desire. :heart:
-If you have any questions, checkout our [documentation](https://docs.streamlit.io) and [community
-forums](https://discuss.streamlit.io).

+# P2-ETF-CNN-LSTM-ALTERNATIVE-APPROACHES
+Macro-driven ETF rotation using three augmented CNN-LSTM variants.
+Winner selected by **highest raw annualised return** on the out-of-sample test set.
+---
+## Architecture Overview
+| Approach | Core Idea | Key Addition |
+|---|---|---|
+| **1 — Wavelet** | DWT decomposes each macro signal into frequency subbands before the CNN | Separates trend / cycle / noise |
+| **2 — Regime-Conditioned** | HMM detects macro regimes; one-hot regime label concatenated into the network | Removes non-stationarity |
+| **3 — Multi-Scale Parallel** | Three CNN towers (kernels 3, 7, 21 days) run in parallel before the LSTM | Captures momentum + cycle + trend simultaneously |
 ---
+## ETF Universe
+| Ticker | Description |
+|---|---|
+| TLT | 20+ Year Treasury Bond |
+| TBT | 20+ Year Treasury Short (2×) |
+| VNQ | Real Estate (REIT) |
+| SLV | Silver |
+| GLD | Gold |
+| CASH | 3m T-bill rate (from HF dataset) |
+Benchmarks (chart only, not traded): **SPY**, **AGG**
+---
+## Data
+All data sourced exclusively from:
+**`P2SAMAPA/fi-etf-macro-signal-master-data`** (HuggingFace Dataset)
+File: `master_data.parquet`
+No external API calls (no yfinance, no FRED).
+The app checks daily whether the prior NYSE trading day's data is present in the dataset.
 ---
+## Project Structure
+```
+├── .github/
+│   └── workflows/
+│       └── sync.yml            # Auto-sync GitHub → HF Space on push to main
+│
+├── app.py                      # Streamlit orchestrator (UI wiring only)
+│
+├── data/
+│   └── loader.py               # HF dataset load, freshness check, column validation
+│
+├── models/
+│   ├── base.py                 # Shared: sequences, splits, scaling, callbacks
+│   ├── approach1_wavelet.py    # Wavelet CNN-LSTM
+│   ├── approach2_regime.py     # Regime-Conditioned CNN-LSTM
+│   └── approach3_multiscale.py # Multi-Scale Parallel CNN-LSTM
+│
+├── strategy/
+│   └── backtest.py             # execute_strategy, metrics, winner selection
+│
+├── signals/
+│   └── conviction.py           # Z-score conviction scoring
+│
+├── ui/
+│   ├── components.py           # Banner, conviction panel, metrics, audit trail
+│   └── charts.py               # Plotly equity curve + comparison bar chart
+│
+├── utils/
+│   └── calendar.py             # NYSE calendar, next trading day, EST time
+│
+├── requirements.txt
+└── README.md
+```
+---
+## Secrets Required
+| Secret | Where | Purpose |
+|---|---|---|
+| `HF_TOKEN` | GitHub + HF Space | Read HF dataset · Sync HF Space |
+Set in:
+- GitHub: `Settings → Secrets → Actions → New repository secret`
+- HF Space: `Settings → Repository secrets`
+---
+## Deployment
+Push to `main` → GitHub Actions (`sync.yml`) automatically syncs to HF Space.
+### Local development
+```bash
+pip install -r requirements.txt
+export HF_TOKEN=your_token
+streamlit run app.py
+```
+---
+## Output UI
+1. **Data freshness warning** — alerts if prior NYSE trading day data is missing
+2. **Next Trading Day Signal** — date + ETF from the winning approach
+3. **Signal Conviction** — Z-score gauge + per-ETF probability bars
+4. **Performance Metrics** — Annualised Return, Sharpe, Hit Ratio, Max DD
+5. **Approach Comparison Table** — all three approaches side by side
+6. **Equity Curves** — all three approaches + SPY + AGG benchmarks
+7. **Audit Trail** — last 20 trading days for the winning approach

app.py ADDED Viewed

	@@ -0,0 +1,273 @@

+"""
+app.py
+P2-ETF-CNN-LSTM-ALTERNATIVE-APPROACHES
+Streamlit orchestrator — UI wiring only, no business logic here.
+"""
+import os
+import streamlit as st
+import pandas as pd
+import numpy as np
+# ── Module imports ────────────────────────────────────────────────────────────
+from data.loader      import load_dataset, check_data_freshness, get_features_and_targets, dataset_summary
+from utils.calendar   import get_est_time, is_sync_window, get_next_signal_date
+from models.base      import build_sequences, train_val_test_split, scale_features, returns_to_labels
+from models.approach1_wavelet    import train_approach1, predict_approach1
+from models.approach2_regime     import train_approach2, predict_approach2
+from models.approach3_multiscale import train_approach3, predict_approach3
+from strategy.backtest  import execute_strategy, select_winner, build_comparison_table
+from signals.conviction import compute_conviction
+from ui.components import (
+    show_freshness_status, show_signal_banner, show_conviction_panel,
+    show_metrics_row, show_comparison_table, show_audit_trail,
+)
+from ui.charts import equity_curve_chart, comparison_bar_chart
+# ── Page config ───────────────────────────────────────────────────────────────
+st.set_page_config(
+    page_title="P2-ETF-CNN-LSTM",
+    page_icon="🧠",
+    layout="wide",
+)
+# ── Secrets ───────────────────────────────────────────────────────────────────
+HF_TOKEN = os.getenv("HF_TOKEN", "")
+# ── Sidebar ───────────────────────────────────────────────────────────────────
+with st.sidebar:
+    st.header("⚙️ Configuration")
+    now_est = get_est_time()
+    st.write(f"🕒 **EST:** {now_est.strftime('%H:%M:%S')}")
+    if is_sync_window():
+        st.success("✅ Sync Window Active")
+    else:
+        st.info("⏸️ Sync Window Inactive")
+    st.divider()
+    start_yr = st.slider("📅 Start Year", 2010, 2024, 2016)
+    fee_bps  = st.slider("💰 Fee (bps)", 0, 50, 10)
+    lookback = st.slider("📐 Lookback (days)", 20, 60, 30, step=5)
+    epochs   = st.number_input("🔁 Max Epochs", 20, 300, 100, step=10)
+    st.divider()
+    split_option = st.selectbox("📊 Train/Val/Test Split", ["70/15/15", "80/10/10"], index=0)
+    split_map    = {"70/15/15": (0.70, 0.15), "80/10/10": (0.80, 0.10)}
+    train_pct, val_pct = split_map[split_option]
+    include_cash = st.checkbox("💵 Include CASH class", value=True,
+                               help="Model can select CASH (earns T-bill rate) as an alternative to any ETF")
+    st.divider()
+    run_button = st.button("🚀 Run All 3 Approaches", type="primary", use_container_width=True)
+# ── Title ─────────────────────────────────────────────────────────────────────
+st.title("🧠 P2-ETF-CNN-LSTM")
+st.caption("Approach 1: Wavelet  ·  Approach 2: Regime-Conditioned  ·  Approach 3: Multi-Scale Parallel")
+st.caption("Winner selected by highest raw annualised return on out-of-sample test set.")
+# ── Load data (always, to check freshness) ────────────────────────────────────
+if not HF_TOKEN:
+    st.error("❌ HF_TOKEN secret not found. Please add it to your HF Space / GitHub secrets.")
+    st.stop()
+with st.spinner("📡 Loading dataset from HuggingFace..."):
+    df = load_dataset(HF_TOKEN)
+if df.empty:
+    st.stop()
+# ── Freshness check ───────────────────────────────────────────────────────────
+freshness = check_data_freshness(df)
+show_freshness_status(freshness)
+# ── Dataset summary in sidebar ────────────────────────────────────────────────
+with st.sidebar:
+    st.divider()
+    st.subheader("📦 Dataset Info")
+    summary = dataset_summary(df)
+    if summary:
+        st.write(f"**Rows:** {summary['rows']:,}")
+        st.write(f"**Range:** {summary['start_date']} → {summary['end_date']}")
+        st.write(f"**ETFs:** {', '.join([e.replace('_Ret','') for e in summary['etfs_found']])}")
+        st.write(f"**Benchmarks:** {', '.join([b.replace('_Ret','') for b in summary['benchmarks']])}")
+        st.write(f"**T-bill col:** {'✅' if summary['tbill_found'] else '❌'}")
+# ── Main execution ────────────────────────────────────────────────────────────
+if not run_button:
+    st.info("👈 Configure parameters in the sidebar and click **🚀 Run All 3 Approaches** to begin.")
+    st.stop()
+# ── Filter by start year ──────────────────────────────────────────────────────
+df = df[df.index.year >= start_yr].copy()
+st.write(f"📅 **Data:** {df.index[0].strftime('%Y-%m-%d')} → {df.index[-1].strftime('%Y-%m-%d')}  "
+         f"({df.index[-1].year - df.index[0].year + 1} years)")
+# ── Feature / target extraction ───────────────────────────────────────────────
+try:
+    input_features, target_etfs, tbill_rate = get_features_and_targets(df)
+except ValueError as e:
+    st.error(str(e))
+    st.stop()
+st.info(f"🎯 **Targets:** {len(target_etfs)} ETFs  ·  **Features:** {len(input_features)} signals  ·  "
+        f"**T-bill rate:** {tbill_rate*100:.2f}%")
+# ── Prepare sequences ─────────────────────────────────────────────────────────
+X_raw    = df[input_features].values.astype(np.float32)
+y_raw    = df[target_etfs].values.astype(np.float32)
+n_etfs   = len(target_etfs)
+n_classes = n_etfs + (1 if include_cash else 0)   # +1 for CASH
+# Fill NaNs with column means
+col_means = np.nanmean(X_raw, axis=0)
+for j in range(X_raw.shape[1]):
+    mask = np.isnan(X_raw[:, j])
+    X_raw[mask, j] = col_means[j]
+X_seq, y_seq = build_sequences(X_raw, y_raw, lookback)
+y_labels     = returns_to_labels(y_seq, include_cash=include_cash)
+X_train, y_train_r, X_val, y_val_r, X_test, y_test_r = train_val_test_split(X_seq, y_seq, train_pct, val_pct)
+_, y_train_l, _, y_val_l, _, y_test_l                 = train_val_test_split(X_seq, y_labels, train_pct, val_pct)
+X_train_s, X_val_s, X_test_s, _ = scale_features(X_train, X_val, X_test)
+train_size = len(X_train)
+val_size   = len(X_val)
+# Test dates (aligned with y_test)
+test_start  = lookback + train_size + val_size
+test_dates  = df.index[test_start: test_start + len(X_test)]
+test_slice  = slice(test_start, test_start + len(X_test))
+st.success(f"✅ Sequences — Train: {train_size} · Val: {val_size} · Test: {len(X_test)}")
+# ── Train all three approaches ────────────────────────────────────────────────
+results      = {}
+trained_info = {}   # store extra info needed for conviction
+progress = st.progress(0, text="Starting training...")
+# ── Approach 1: Wavelet ───────────────────────────────────────────────────────
+with st.spinner("🌊 Training Approach 1 — Wavelet CNN-LSTM..."):
+    try:
+        model1, hist1, _ = train_approach1(
+            X_train_s, y_train_l,
+            X_val_s,   y_val_l,
+            n_classes=n_classes, epochs=int(epochs),
+        )
+        preds1, proba1 = predict_approach1(model1, X_test_s)
+        results["Approach 1"] = execute_strategy(
+            preds1, proba1, y_test_r, test_dates, target_etfs, fee_bps, tbill_rate, include_cash,
+        )
+        trained_info["Approach 1"] = {"proba": proba1}
+        st.success("✅ Approach 1 complete")
+    except Exception as e:
+        st.warning(f"⚠️ Approach 1 failed: {e}")
+        results["Approach 1"] = None
+progress.progress(33, text="Approach 1 done...")
+# ── Approach 2: Regime-Conditioned ───────────────────────────────────────────
+with st.spinner("🔀 Training Approach 2 — Regime-Conditioned CNN-LSTM..."):
+    try:
+        model2, hist2, hmm2, regime_cols2 = train_approach2(
+            X_train_s, y_train_l,
+            X_val_s,   y_val_l,
+            X_flat_all=X_raw,
+            feature_names=input_features,
+            lookback=lookback,
+            train_size=train_size,
+            val_size=val_size,
+            n_classes=n_classes, epochs=int(epochs),
+        )
+        preds2, proba2 = predict_approach2(
+            model2, X_test_s, X_raw, regime_cols2, hmm2,
+            lookback, train_size, val_size,
+        )
+        results["Approach 2"] = execute_strategy(
+            preds2, proba2, y_test_r, test_dates, target_etfs, fee_bps, tbill_rate, include_cash,
+        )
+        trained_info["Approach 2"] = {"proba": proba2}
+        st.success("✅ Approach 2 complete")
+    except Exception as e:
+        st.warning(f"⚠️ Approach 2 failed: {e}")
+        results["Approach 2"] = None
+progress.progress(66, text="Approach 2 done...")
+# ── Approach 3: Multi-Scale ───────────────────────────────────────────────────
+with st.spinner("📡 Training Approach 3 — Multi-Scale CNN-LSTM..."):
+    try:
+        model3, hist3 = train_approach3(
+            X_train_s, y_train_l,
+            X_val_s,   y_val_l,
+            n_classes=n_classes, epochs=int(epochs),
+        )
+        preds3, proba3 = predict_approach3(model3, X_test_s)
+        results["Approach 3"] = execute_strategy(
+            preds3, proba3, y_test_r, test_dates, target_etfs, fee_bps, tbill_rate, include_cash,
+        )
+        trained_info["Approach 3"] = {"proba": proba3}
+        st.success("✅ Approach 3 complete")
+    except Exception as e:
+        st.warning(f"⚠️ Approach 3 failed: {e}")
+        results["Approach 3"] = None
+progress.progress(100, text="All approaches complete!")
+progress.empty()
+# ── Select winner ─────────────────────────────────────────────────────────────
+winner_name = select_winner(results)
+winner_res  = results.get(winner_name)
+if winner_res is None:
+    st.error("❌ All approaches failed. Please check your data and configuration.")
+    st.stop()
+# ── Next trading date ─────────────────────────────────────────────────────────
+next_date = get_next_signal_date()
+st.divider()
+# ── Signal banner (winner) ────────────────────────────────────────────────────
+show_signal_banner(winner_res["next_signal"], next_date, winner_name)
+# ── Conviction panel ──────────────────────────────────────────────────────────
+winner_proba = trained_info[winner_name]["proba"]
+conviction   = compute_conviction(winner_proba[-1], target_etfs, include_cash)
+show_conviction_panel(conviction)
+st.divider()
+# ── Winner metrics ────────────────────────────────────────────────────────────
+st.subheader(f"📊 {winner_name} — Performance Metrics")
+show_metrics_row(winner_res, tbill_rate)
+st.divider()
+# ── Comparison table ──────────────────────────────────────────────────────────
+st.subheader("🏆 Approach Comparison (Winner = Highest Raw Annualised Return)")
+comparison_df = build_comparison_table(results, winner_name)
+show_comparison_table(comparison_df)
+# ── Comparison bar chart ──────────────────────────────────────────────────────
+st.plotly_chart(comparison_bar_chart(results, winner_name), use_container_width=True)
+st.divider()
+# ── Equity curves ─────────────────────────────────────────────────────────────
+st.subheader("📈 Out-of-Sample Equity Curves — All Approaches vs Benchmarks")
+fig = equity_curve_chart(results, winner_name, test_dates, df, test_slice, tbill_rate)
+st.plotly_chart(fig, use_container_width=True)
+st.divider()
+# ── Audit trail (winner) ──────────────────────────────────────────────────────
+st.subheader(f"📋 Audit Trail — {winner_name} (Last 20 Trading Days)")
+show_audit_trail(winner_res["audit_trail"])

hf_space/.gitattributes ADDED Viewed

	@@ -0,0 +1,35 @@

+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ckpt filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.mlmodel filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pickle filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tar filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text

hf_space/Dockerfile ADDED Viewed

	@@ -0,0 +1,20 @@

+FROM python:3.13.5-slim
+WORKDIR /app
+RUN apt-get update && apt-get install -y \
+    build-essential \
+    curl \
+    git \
+    && rm -rf /var/lib/apt/lists/*
+COPY requirements.txt ./
+COPY src/ ./src/
+RUN pip3 install -r requirements.txt
+EXPOSE 8501
+HEALTHCHECK CMD curl --fail http://localhost:8501/_stcore/health
+ENTRYPOINT ["streamlit", "run", "src/streamlit_app.py", "--server.port=8501", "--server.address=0.0.0.0"]

hf_space/README.md ADDED Viewed

	@@ -0,0 +1,19 @@

+---
+title: P2 ETF CNN LSTM ALTERNATIVE APPROACHES
+emoji: 🚀
+colorFrom: red
+colorTo: red
+sdk: docker
+app_port: 8501
+tags:
+- streamlit
+pinned: false
+short_description: Streamlit template space
+---
+# Welcome to Streamlit!
+Edit `/src/streamlit_app.py` to customize this app to your heart's desire. :heart:
+If you have any questions, checkout our [documentation](https://docs.streamlit.io) and [community
+forums](https://discuss.streamlit.io).

hf_space/requirements.txt ADDED Viewed

	@@ -0,0 +1,3 @@

+altair
+pandas
+streamlit

hf_space/src/streamlit_app.py ADDED Viewed

	@@ -0,0 +1,40 @@

+import altair as alt
+import numpy as np
+import pandas as pd
+import streamlit as st
+"""
+# Welcome to Streamlit!
+Edit `/streamlit_app.py` to customize this app to your heart's desire :heart:.
+If you have any questions, checkout our [documentation](https://docs.streamlit.io) and [community
+forums](https://discuss.streamlit.io).
+In the meantime, below is an example of what you can do with just a few lines of code:
+"""
+num_points = st.slider("Number of points in spiral", 1, 10000, 1100)
+num_turns = st.slider("Number of turns in spiral", 1, 300, 31)
+indices = np.linspace(0, 1, num_points)
+theta = 2 * np.pi * num_turns * indices
+radius = indices
+x = radius * np.cos(theta)
+y = radius * np.sin(theta)
+df = pd.DataFrame({
+    "x": x,
+    "y": y,
+    "idx": indices,
+    "rand": np.random.randn(num_points),
+})
+st.altair_chart(alt.Chart(df, height=700, width=700)
+    .mark_point(filled=True)
+    .encode(
+        x=alt.X("x", axis=None),
+        y=alt.Y("y", axis=None),
+        color=alt.Color("idx", legend=None, scale=alt.Scale()),
+        size=alt.Size("rand", legend=None, scale=alt.Scale(range=[1, 150])),
+    ))

requirements.txt CHANGED Viewed

@@ -1,3 +1,29 @@
-altair
-pandas
-streamlit

+# Core
+streamlit>=1.32.0
+pandas>=2.0.0
+numpy>=1.24.0
+# Hugging Face
+huggingface_hub>=0.21.0
+datasets>=2.18.0
+# Machine Learning
+tensorflow>=2.14.0
+scikit-learn>=1.3.0
+xgboost>=2.0.0
+# Wavelet (Approach 1)
+PyWavelets>=1.5.0
+# Regime detection (Approach 2)
+hmmlearn>=0.3.0
+# Visualisation
+plotly>=5.18.0
+# NYSE Calendar
+pandas_market_calendars>=4.3.0
+pytz>=2024.1
+# Parquet
+pyarrow>=14.0.0