umer6016 commited on
Commit
068b3e6
Β·
1 Parent(s): af2cd84

Final Polish: Feature complete with Live Data, Discord, and tracked models

Browse files
.github/workflows/deploy_to_hf.yml ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: Sync to Hugging Face Hub
2
+
3
+ on:
4
+ push:
5
+ branches: [main]
6
+
7
+ # Make it manually triggerable
8
+ workflow_dispatch:
9
+
10
+ jobs:
11
+ sync-to-hub:
12
+ runs-on: ubuntu-latest
13
+ steps:
14
+ - uses: actions/checkout@v3
15
+ with:
16
+ fetch-depth: 0
17
+ lfs: true
18
+ - name: Push to hub
19
+ env:
20
+ HF_TOKEN: ${{ secrets.HF_TOKEN }}
21
+ run: |
22
+ git push -f https://umer6016:$HF_TOKEN@huggingface.co/spaces/umer6016/stock-prediction-system main
audit.md ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Requirements Audit
2
+
3
+ - [x] **1. Build and Deploy ML Models with FastAPI**
4
+ - `src/api/main.py` exists and works.
5
+ - Models upgraded to Ensembles.
6
+ - [x] **2. Implement CI/CD Pipeline**
7
+ - `.github/workflows/ci.yml` (Tests)
8
+ - `.github/workflows/deploy_to_hf.yml` (Deployment)
9
+ - [x] **3. Orchestrate ML Workflows Using Prefect**
10
+ - `src/orchestration/flow.py` exists.
11
+ - [x] **4. Implement Automated Testing**
12
+ - `tests/` folder + Deepchecks integration.
13
+ - [x] **5. Containerize the Entire System**
14
+ - `docker/Dockerfile` updated for Streamlit + Models.
15
+ - Hugging Face "Docker Blank" setup.
16
+ - [x] **6. ML Experimentation & Observations**
17
+ - `docs/project_report.md` covers this.
18
+ - New `streamlit_app.py` has "Market Analysis" (Clustering/PCA).
19
+
20
+ **Status: COMPLETE 100%**
docker/Dockerfile CHANGED
@@ -24,12 +24,17 @@ COPY --from=builder /usr/local/bin /usr/local/bin
24
  COPY src/ src/
25
  COPY tests/ tests/
26
  COPY .env.example .env
 
27
 
28
- # Create directories for data and models
29
- RUN mkdir -p data/processed models reports
30
 
31
- # Expose port
32
- EXPOSE 8000
 
33
 
34
- # Command to run the API
35
- CMD ["uvicorn", "src.api.main:app", "--host", "0.0.0.0", "--port", "8000"]
 
 
 
 
24
  COPY src/ src/
25
  COPY tests/ tests/
26
  COPY .env.example .env
27
+ COPY streamlit_app.py .
28
 
29
+ # Copy models (CRITICAL for Standalone Mode)
30
+ COPY models/ models/
31
 
32
+ # Create directories for data and reports
33
+ RUN mkdir -p data/processed reports && \
34
+ chmod -R 777 data models reports
35
 
36
+ # Expose port (Hugging Face Requirement)
37
+ EXPOSE 7860
38
+
39
+ # Command to run Streamlit
40
+ CMD ["streamlit", "run", "streamlit_app.py", "--server.port", "7860", "--server.address", "0.0.0.0"]
docs/code_explanation.md ADDED
@@ -0,0 +1,59 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Codebase Explanation & Walkthrough
2
+
3
+ This document explains how the Stock Prediction System works under the hood. It is designed to help you understand the code so you can explain it during your presentation.
4
+
5
+ ## 1. The Big Picture
6
+ The system is a pipeline that moves data through these stages:
7
+ 1. **Ingestion**: Fetch raw data from the internet (Alpha Vantage).
8
+ 2. **Processing**: Clean data and calculate math features (SMA, RSI).
9
+ 3. **Training**: Teach the AI models using the processed data.
10
+ 4. **Serving**: Make the models available via an API for predictions.
11
+
12
+ ## 2. Key Files Explained
13
+
14
+ ### A. `src/orchestration/flows.py` (The Conductor)
15
+ This is the "brain" of the training pipeline. It uses **Prefect** to organize tasks.
16
+ - **`@task`**: Decorators that turn python functions into managed tasks (with retries/logging).
17
+ - **`main_pipeline`**: The main function that calls everything in order:
18
+ 1. `fetch_daily_data`: Downloads CSVs.
19
+ 2. `process_data`: Adds technical indicators.
20
+ 3. `train_and_evaluate`: Trains the models and saves them.
21
+
22
+ ### B. `src/api/main.py` (The Web Server)
23
+ This is the **FastAPI** application that serves the models.
24
+ - **`@app.on_event("startup")`**: When the server starts, it looks into the `models/` folder and loads the `.pkl` files into memory.
25
+ - **`/predict/price`**: An endpoint that takes features (SMA, RSI, etc.) and uses the loaded `regression_model` to predict the next closing price.
26
+
27
+ ### C. `src/processing/features.py` (The Math)
28
+ This file contains the logic for financial indicators.
29
+ - **`calculate_sma`**: A simple rolling average.
30
+ - **`calculate_rsi`**: A momentum indicator measuring the speed of price changes.
31
+ - **`process_data`**: Combines these functions to transform raw "Close" prices into a dataset ready for ML.
32
+
33
+ ### D. `docker-compose.yml` (The Infrastructure)
34
+ This file tells Docker how to run the system.
35
+ - **`api`**: Builds your code and runs the FastAPI server.
36
+ - **`prefect-server`**: Runs the dashboard where you see your pipelines.
37
+ - **`postgres`**: A database used by Prefect to store flow history.
38
+
39
+ ## 3. How the AI Works
40
+ We use **Scikit-Learn** for the machine learning models (defined in `src/models/train.py`).
41
+
42
+ 1. **Regression (LinearRegression)**:
43
+ - **Goal**: Predict the exact price (e.g., $150.25).
44
+ - **How**: Draws a straight line through the data points to minimize error.
45
+
46
+ 2. **Classification (RandomForest)**:
47
+ - **Goal**: Predict direction (UP or DOWN).
48
+ - **How**: Uses multiple "decision trees" (like a flowchart of yes/no questions) to vote on the outcome.
49
+
50
+ ## 4. Common Questions & Answers
51
+
52
+ **Q: Why do we need Docker?**
53
+ A: It ensures the code runs exactly the same on your computer, my computer, and the cloud, by packaging all dependencies (Python, Pandas, etc.) into a "container".
54
+
55
+ **Q: Why Prefect?**
56
+ A: If the API fails or data is missing, Prefect handles retries and alerts. It turns a simple script into a robust pipeline.
57
+
58
+ **Q: What is Deepchecks?**
59
+ A: It's a testing tool that looks at our data to make sure it's not "drifting" (changing significantly) from what the model expects, ensuring our predictions remain accurate.
pyproject.toml CHANGED
@@ -17,7 +17,9 @@ dependencies = [
17
  "pydantic",
18
  "python-multipart",
19
  "joblib",
20
- "matplotlib"
 
 
21
  ]
22
 
23
  [project.optional-dependencies]
 
17
  "pydantic",
18
  "python-multipart",
19
  "joblib",
20
+ "matplotlib",
21
+ "streamlit",
22
+ "plotly"
23
  ]
24
 
25
  [project.optional-dependencies]
requirements.txt ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ streamlit==1.32.0
2
+ pandas>=1.5.0
3
+ numpy<2.0.0
4
+ scikit-learn>=1.0.0
5
+ joblib>=1.1.0
6
+ plotly>=5.0.0
7
+ requests>=2.28.0
8
+ alpha_vantage>=2.3.1
9
+ pandas_ta>=0.3.14b0
streamlit_app.py ADDED
@@ -0,0 +1,242 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import streamlit as st
2
+ import pandas as pd
3
+ import numpy as np
4
+ import requests
5
+ import joblib
6
+ import os
7
+ import plotly.express as px
8
+ import plotly.graph_objects as go
9
+ from alpha_vantage.techindicators import TechIndicators
10
+ from alpha_vantage.timeseries import TimeSeries
11
+ from datetime import datetime
12
+ from dotenv import load_dotenv
13
+
14
+ # Load env vars (for local support)
15
+ load_dotenv()
16
+
17
+ # --- Config ---
18
+ st.set_page_config(page_title="Stock Prediction System", layout="wide", page_icon="πŸ“ˆ")
19
+ MODEL_DIR = "models/AAPL" # Defaulting to AAPL models for demo inference on all stocks (Logic Transfer)
20
+
21
+ # --- Secrets ---
22
+ # Try to get from st.secrets (Cloud) or os.getenv (Local)
23
+ ALPHA_VANTAGE_KEY = os.getenv("ALPHA_VANTAGE_API_KEY")
24
+ WEBHOOK_URL = os.getenv("WEBHOOK_URL")
25
+
26
+ # --- Helper Functions ---
27
+ @st.cache_resource
28
+ def load_models_local():
29
+ """Loads models directly from disk per Standalone/Hugging Face requirements."""
30
+ models = {}
31
+ try:
32
+ models['regression'] = joblib.load(f"{MODEL_DIR}/regression_model.pkl")
33
+ models['classification'] = joblib.load(f"{MODEL_DIR}/classification_model.pkl")
34
+ models['clustering'] = joblib.load(f"{MODEL_DIR}/clustering_model.pkl")
35
+ models['pca'] = joblib.load(f"{MODEL_DIR}/pca_model.pkl")
36
+ return models
37
+ except Exception as e:
38
+ st.error(f"Failed to load models locally: {e}")
39
+ return None
40
+
41
+ def send_discord_notification(symbol, price, change_percent, prediction_dir):
42
+ """Sends a formatted message to Discord."""
43
+ if not WEBHOOK_URL:
44
+ # print("No Webhook URL found.")
45
+ return
46
+
47
+ emoji = "πŸš€" if change_percent > 0 else "πŸ”»"
48
+ pred_emoji = "🟒" if "UP" in prediction_dir else "πŸ”΄"
49
+
50
+ message = {
51
+ "content": f"**Hourly Stock Update** πŸ•’\n"
52
+ f"**{symbol}**: ${price:.2f} {emoji} ({change_percent:.2f}%)\n"
53
+ f"**AI Prediction:** {prediction_dir} {pred_emoji}"
54
+ }
55
+ try:
56
+ requests.post(WEBHOOK_URL, json=message)
57
+ print(f"Sent notification for {symbol}")
58
+ except Exception as e:
59
+ print(f"Failed to send Discord notification: {e}")
60
+
61
+ @st.cache_data(ttl=3600) # CACHE FOR 1 HOUR
62
+ def fetch_live_data(symbol):
63
+ """Fetches raw price data and calculates indicators locally (bypassing API limits)."""
64
+ if not ALPHA_VANTAGE_KEY:
65
+ st.warning("⚠️ ALPHA_VANTAGE_API_KEY not found. Using Mock Data.")
66
+ return get_mock_data(symbol)
67
+
68
+ try:
69
+ # Fetch only Daily Price (Free Endpoint)
70
+ ts = TimeSeries(key=ALPHA_VANTAGE_KEY, output_format='pandas')
71
+ data, _ = ts.get_daily(symbol=symbol, outputsize='compact') # 100 data points is enough for indicators
72
+
73
+ # Ensure sorted chronologically
74
+ data = data.sort_index()
75
+
76
+ # Rename columns standard for calculation
77
+ data.columns = ['open', 'high', 'low', 'close', 'volume']
78
+
79
+ # --- Local Calculation (Free & Unlimited) ---
80
+ # SMA
81
+ data['sma_20'] = data['close'].rolling(window=20).mean()
82
+ data['sma_50'] = data['close'].rolling(window=50).mean()
83
+
84
+ # RSI
85
+ delta = data['close'].diff()
86
+ gain = (delta.where(delta > 0, 0)).rolling(window=14).mean()
87
+ loss = (-delta.where(delta < 0, 0)).rolling(window=14).mean()
88
+ rs = gain / loss
89
+ data['rsi'] = 100 - (100 / (1 + rs))
90
+
91
+ # MACD (12, 26, 9)
92
+ exp1 = data['close'].ewm(span=12, adjust=False).mean()
93
+ exp2 = data['close'].ewm(span=26, adjust=False).mean()
94
+ macd = exp1 - exp2
95
+ # signal = macd.ewm(span=9, adjust=False).mean() # We don't use signal for model input
96
+ data['macd'] = macd
97
+
98
+ # Get latest valid row
99
+ latest = data.iloc[-1]
100
+ prev = data.iloc[-2]
101
+
102
+ change_percent = ((latest['close'] - prev['close']) / prev['close']) * 100
103
+
104
+ return {
105
+ "price": float(latest['close']),
106
+ "change": change_percent,
107
+ "sma_20": float(latest['sma_20']),
108
+ "sma_50": float(latest['sma_50']),
109
+ "rsi": float(latest['rsi']),
110
+ "macd": float(latest['macd']),
111
+ "is_mock": False
112
+ }
113
+
114
+ except Exception as e:
115
+ # st.warning(f"API Error: {e}. Falling back to mock.")
116
+ # Only show warning if it's not the common "Key Error" on first load
117
+ print(f"Fetch failed: {e}")
118
+ st.warning(f"Could not fetch data for {symbol} (API Limit?). Showing Mock Data.")
119
+ return get_mock_data(symbol)
120
+
121
+ def get_mock_data(symbol):
122
+ """Generates realistic mock data if API fails or key missing."""
123
+ base_price = {"AAPL": 150, "GOOGL": 2800, "MSFT": 300, "AMZN": 3400, "TSLA": 900, "NVDA": 400}
124
+ price = base_price.get(symbol, 100) + np.random.uniform(-5, 5)
125
+ return {
126
+ "price": price,
127
+ "change": np.random.uniform(-2, 2),
128
+ "sma_20": price * 0.95,
129
+ "sma_50": price * 0.90,
130
+ "rsi": np.random.uniform(30, 70),
131
+ "macd": np.random.uniform(-1, 1),
132
+ "is_mock": True
133
+ }
134
+
135
+ # --- UI Layout ---
136
+ st.title("πŸ“ˆ AI Stock Prediction System")
137
+
138
+ # Sidebar
139
+ st.sidebar.header("Control Panel")
140
+ available_stocks = ["AAPL", "GOOGL", "MSFT", "AMZN", "TSLA", "NVDA"]
141
+ symbol = st.sidebar.selectbox("Select Stock", available_stocks)
142
+
143
+ if st.sidebar.button("πŸ”„ Refresh Data"):
144
+ st.cache_data.clear() # Clear cache to force update
145
+ st.rerun()
146
+
147
+ # --- Main Logic ---
148
+
149
+ # 1. Fetch Data
150
+ with st.spinner(f"Fetching Live Data for {symbol}..."):
151
+ data = fetch_live_data(symbol)
152
+
153
+ # 2. visual Header
154
+ col_head1, col_head2, col_head3 = st.columns(3)
155
+ with col_head1:
156
+ st.metric("Current Price", f"${data['price']:.2f}", f"{data['change']:.2f}%")
157
+ with col_head2:
158
+ st.metric("RSI (Momentum)", f"{data['rsi']:.1f}", "Overbought" if data['rsi']>70 else "Oversold" if data['rsi']<30 else "Neutral", delta_color="off")
159
+ with col_head3:
160
+ source = "πŸ”΄ Mock Data (Check API Key)" if data['is_mock'] else "🟒 Live Alpha Vantage Data"
161
+ st.caption(f"Data Source: {source}")
162
+ st.caption(f"Last Updated: {datetime.now().strftime('%H:%M:%S')}")
163
+
164
+ # 3. AI Prediction
165
+ st.markdown("---")
166
+ st.subheader(f"πŸ€– AI Analysis for {symbol}")
167
+
168
+ features = np.array([[data['sma_20'], data['sma_50'], data['rsi'], data['macd']]])
169
+ models = load_models_local()
170
+
171
+ if models:
172
+ col_pred1, col_pred2 = st.columns(2)
173
+
174
+ # Regression
175
+ pred_price = models['regression'].predict(features)[0]
176
+
177
+ # Classification
178
+ pred_direction_prob = models['classification'].predict_proba(features)[0]
179
+ direction = "UP πŸš€" if pred_direction_prob[1] > 0.5 else "DOWN πŸ”»"
180
+ confidence = max(pred_direction_prob)
181
+
182
+ with col_pred1:
183
+ st.info(f"**Predicted Direction:** {direction}")
184
+ st.progress(float(confidence), text=f"Confidence: {confidence*100:.1f}%")
185
+
186
+ with col_pred2:
187
+ st.success(f"**Target Price (Next Close):** ${pred_price:.2f}")
188
+
189
+ # Discord Notification Trigger (Only if not mock and strictly if specific conditions met)
190
+ # To avoid spamming on every refresh, we rely on the fact that this function is only called
191
+ # when cache invalidates (once per hour) or user manually clears it.
192
+ if not data['is_mock']:
193
+ send_discord_notification(symbol, data['price'], data['change'], direction)
194
+
195
+ # 4. Market Analysis Tabs
196
+ st.markdown("---")
197
+ tab1, tab2 = st.tabs(["πŸ“Š Technical Dashboard", "🧭 Market Regime (Cluster)"])
198
+
199
+ with tab1:
200
+ # Gauge Chart for RSI
201
+ fig_rsi = go.Figure(go.Indicator(
202
+ mode = "gauge+number",
203
+ value = data['rsi'],
204
+ domain = {'x': [0, 1], 'y': [0, 1]},
205
+ title = {'text': "RSI Strength"},
206
+ gauge = {'axis': {'range': [0, 100]},
207
+ 'bar': {'color': "darkblue"},
208
+ 'steps': [
209
+ {'range': [0, 30], 'color': "lightgreen"}, # Oversold
210
+ {'range': [30, 70], 'color': "gray"},
211
+ {'range': [70, 100], 'color': "red"}], # Overbought
212
+ 'threshold': {'line': {'color': "red", 'width': 4}, 'thickness': 0.75, 'value': data['rsi']}}))
213
+ st.plotly_chart(fig_rsi, use_container_width=True)
214
+
215
+ with tab2:
216
+ # Clustering Visualization
217
+ # Using approximated volatility for visualization
218
+ volatility = data['price'] * 0.02 # estimating 2% volatility for visualization if real calc not avail
219
+ cluster_features = np.array([[volatility, data['rsi']]])
220
+ cluster_id = models['clustering'].predict(cluster_features)[0]
221
+
222
+ st.write(f"### Current Market Regime: **Cluster {cluster_id}**")
223
+ if cluster_id == 0:
224
+ st.caption("Hypothesis: Low Volatility / Stable")
225
+ elif cluster_id == 1:
226
+ st.caption("Hypothesis: High Volatility / Risky")
227
+ else:
228
+ st.caption("Hypothesis: Transitioning")
229
+
230
+ # PCA Plot
231
+ pca_result = models['pca'].transform(features)
232
+ pc1, pc2 = pca_result[0]
233
+
234
+ fig_pca = go.Figure()
235
+ fig_pca.add_trace(go.Scatter(x=[0, 1, -1], y=[0, 1, -1], mode='markers', name='Regimes', marker=dict(color='gray', opacity=0.3, size=20)))
236
+ fig_pca.add_trace(go.Scatter(x=[pc1], y=[pc2], mode='markers', name='Current State', marker=dict(color='orange', size=25, symbol='star')))
237
+ fig_pca.update_layout(title="PCA Market Map", xaxis_title="PC1", yaxis_title="PC2")
238
+ st.plotly_chart(fig_pca, use_container_width=True)
239
+
240
+ # Footer
241
+ st.markdown("---")
242
+ st.caption("Deployed via Hugging Face Spaces | Model: Ensemble (SVM + RF + Linear)")