Upload folder using huggingface_hub
Browse files- Dockerfile +1 -1
- docs/Conference-template-A4.doc +0 -0
- docs/code_explanation.md +71 -57
- docs/project_report.md +92 -35
- docs/project_report.tex +145 -0
- src/api/main.py +65 -29
- src/ingestion/ingest.py +9 -6
- src/orchestration/flows.py +1 -1
- streamlit_app.py +16 -9
- verify_load.py +38 -0
Dockerfile
CHANGED
|
@@ -23,7 +23,7 @@ COPY --from=builder /usr/local/bin /usr/local/bin
|
|
| 23 |
# Copy application code
|
| 24 |
COPY src/ src/
|
| 25 |
COPY tests/ tests/
|
| 26 |
-
|
| 27 |
COPY streamlit_app.py .
|
| 28 |
|
| 29 |
# Copy models (CRITICAL for Standalone Mode)
|
|
|
|
| 23 |
# Copy application code
|
| 24 |
COPY src/ src/
|
| 25 |
COPY tests/ tests/
|
| 26 |
+
|
| 27 |
COPY streamlit_app.py .
|
| 28 |
|
| 29 |
# Copy models (CRITICAL for Standalone Mode)
|
docs/Conference-template-A4.doc
ADDED
|
Binary file (64 kB). View file
|
|
|
docs/code_explanation.md
CHANGED
|
@@ -1,59 +1,73 @@
|
|
| 1 |
# Codebase Explanation & Walkthrough
|
| 2 |
|
| 3 |
-
This document
|
| 4 |
-
|
| 5 |
-
## 1. The Big Picture
|
| 6 |
-
The system
|
| 7 |
-
1. **Ingestion**:
|
| 8 |
-
2. **Processing**:
|
| 9 |
-
3. **Training**:
|
| 10 |
-
4. **
|
| 11 |
-
|
| 12 |
-
## 2. Key
|
| 13 |
-
|
| 14 |
-
### A.
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
-
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
-
|
| 31 |
-
-
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
-
|
| 38 |
-
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
|
| 50 |
-
|
| 51 |
-
|
| 52 |
-
|
| 53 |
-
|
| 54 |
-
|
| 55 |
-
**
|
| 56 |
-
|
| 57 |
-
|
| 58 |
-
|
| 59 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
# Codebase Explanation & Walkthrough
|
| 2 |
|
| 3 |
+
This document provides a detailed technical explanation of the "AI Stock Prediction System" (Stockker). Use this to understand the underlying logic and architecture.
|
| 4 |
+
|
| 5 |
+
## 1. System Architecture (The Big Picture)
|
| 6 |
+
The system operates as a **Microservices-based Pipeline** with four distinct stages:
|
| 7 |
+
1. **Ingestion Layer**: Fetches raw market data (Open, High, Low, Close, Volume) from Alpha Vantage.
|
| 8 |
+
2. **Processing Layer**: Transforms raw data into technical indicators (Features).
|
| 9 |
+
3. **Training Orchestration**: A Prefect pipeline that trains, evaluates, and saves models.
|
| 10 |
+
4. **Inference API**: A FastAPI server that loads these models to provide real-time predictions.
|
| 11 |
+
|
| 12 |
+
## 2. Key Components Explained
|
| 13 |
+
|
| 14 |
+
### A. Data Ingestion & Processing
|
| 15 |
+
**File:** `src/processing/features.py`
|
| 16 |
+
Feature engineering is critical for financial ML. We treat the market data as a time-series problem.
|
| 17 |
+
- **SMA (Simple Moving Average)**: Calculates the trend over 20 and 50 days.
|
| 18 |
+
- **RSI (Relative Strength Index)**: A momentum oscillator (0-100) to identify overbought/oversold conditions.
|
| 19 |
+
- **MACD (Moving Average Convergence Divergence)**: Tracks momentum changes.
|
| 20 |
+
- **Target Variables**:
|
| 21 |
+
- `target_price`: Next day's closing price (Regression).
|
| 22 |
+
- `target_direction`: 1 (Up) or 0 (Down) for next day (Classification).
|
| 23 |
+
|
| 24 |
+
### B. Machine Learning Models (Ensemble Approach)
|
| 25 |
+
**File:** `src/models/train.py`
|
| 26 |
+
Instead of relying on a single algorithm, we use **Ensemble Learning** for robustness.
|
| 27 |
+
|
| 28 |
+
#### 1. Regression (Predicting Price)
|
| 29 |
+
We use a **Voting Regressor**, which averages predictions from three models:
|
| 30 |
+
- **Linear Regression**: Captures simple linear trends.
|
| 31 |
+
- **Random Forest Regressor**: Captures complex, non-linear patterns (100 Decision Trees).
|
| 32 |
+
- **SVR (Support Vector Regressor)**: Uses an RBF kernel to find the optimal hyperplane in high-dimensional space.
|
| 33 |
+
*Why?* Combining these reduces the variance and error of any single model.
|
| 34 |
+
|
| 35 |
+
#### 2. Classification (Predicting Direction)
|
| 36 |
+
We use a **Voting Classifier** (Soft Voting) combining:
|
| 37 |
+
- **Random Forest Classifier**: Robust against overfitting.
|
| 38 |
+
- **SVC (Support Vector Classifier)**: Good at separating classes with clear margins.
|
| 39 |
+
*Soft Voting* means we average the *probabilities* of each model class, not just their final votes, leading to more nuanced predictions.
|
| 40 |
+
|
| 41 |
+
#### 3. Unsupervised Learning (Market Analysis)
|
| 42 |
+
- **K-Means Clustering**: Groups market days into 3 clusters (e.g., Low Volatility, High Volatility) based on volatility and RSI.
|
| 43 |
+
- **PCA**: Reduces our 4 dimensions (SMA20, SMA50, RSI, MACD) into 2 principal components for 2D visualization.
|
| 44 |
+
|
| 45 |
+
### C. Orchestration (The Pipeline)
|
| 46 |
+
**File:** `src/orchestration/flows.py`
|
| 47 |
+
We use **Prefect** to manage the workflow.
|
| 48 |
+
- **`main_pipeline`**: Loops through our stock list (`AAPL`, `GOOGL`, `MSFT`, `AMZN`, `TSLA`, `NVDA`).
|
| 49 |
+
- For each stock, it sequentially runs: Fetch -> Process -> Train -> Evaluate -> Notify Discord.
|
| 50 |
+
- **Error Handling**: If one stock fails, the pipeline logs the error (via Discord) and continues to the next, ensuring resilience.
|
| 51 |
+
|
| 52 |
+
### D. The API (Model Serving)
|
| 53 |
+
**File:** `src/api/main.py`
|
| 54 |
+
- **Dynamic Loading**: On startup (`@app.on_event("startup")`), the API scans the `models/` directory. It dynamically loads whatever models it finds (e.g., `models/NVDA/regression_model.pkl`), making the system easily extensible to new stocks without changing code.
|
| 55 |
+
- **Enpoints**: Exposes REST endpoints (`/predict/price`, `/predict/direction`) that the frontend consumes.
|
| 56 |
+
|
| 57 |
+
## 3. Infrastructure & DevOps
|
| 58 |
+
|
| 59 |
+
### Docker
|
| 60 |
+
**File:** `Dockerfile` & `docker-compose.yml`
|
| 61 |
+
- We containerize the application to ensure it runs identically on your laptop and the cloud.
|
| 62 |
+
- The `docker-compose` setup includes a Postgres service, which is used **exclusively by Prefect** to store flow run history. The main app uses a **file-based system** (CSVs/PKLs) for simplicity and portability.
|
| 63 |
+
|
| 64 |
+
### CI/CD (GitHub Actions)
|
| 65 |
+
**File:** `.github/workflows/deploy_to_hf.yml`
|
| 66 |
+
- On every push to `main`, GitHub Actions automatically:
|
| 67 |
+
1. Runs `pytest` to verify code correctness.
|
| 68 |
+
2. Pushes the code to the Hugging Face Space, triggering a new deployment.
|
| 69 |
+
|
| 70 |
+
## 4. Why This Architecture?
|
| 71 |
+
- **Modularity**: Separation of concerns (Ingestion vs Training vs Serving) makes debugging easy.
|
| 72 |
+
- **Scalability**: Adding a new stock (like NVDA) only required adding a string to the list; the pipeline handled the rest.
|
| 73 |
+
- **Reliability**: Ensembles prevent "putting all eggs in one basket" model-wise.
|
docs/project_report.md
CHANGED
|
@@ -1,41 +1,98 @@
|
|
| 1 |
-
#
|
| 2 |
|
| 3 |
## 1. Introduction
|
| 4 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
|
| 6 |
## 2. System Architecture
|
| 7 |
-
The system follows a modular architecture
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 14 |
|
| 15 |
## 3. Methodology
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
-
|
| 27 |
-
-
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# AI Stock Prediction & Analysis System - Project Report
|
| 2 |
|
| 3 |
## 1. Introduction
|
| 4 |
+
The **AI Stock Prediction & Analysis System** is an end-to-end machine learning solution designed to predict stock market prices and analyze market regimes in real-time. By leveraging a combination of ensemble machine learning models and unsupervised learning techniques, the system provides users with actionable insights into stock trends and volatility.
|
| 5 |
+
|
| 6 |
+
### 1.1 Problem Statement
|
| 7 |
+
Stock market prediction is inherently challenging due to the stochastic nature of financial data. Traditional methods often fail to capture complex non-linear patterns or adapt to changing market conditions. This project aims to address these challenges by building a robust, automated pipeline that integrates real-time data ingestion, advanced feature engineering, and ensemble modeling to improve prediction accuracy and market understanding.
|
| 8 |
+
|
| 9 |
+
### 1.2 Objectives
|
| 10 |
+
* Develop an automated pipeline for fetching and processing daily stock data.
|
| 11 |
+
* Implement ensemble learning models (Linear Regression, Random Forest, SVM) for price prediction.
|
| 12 |
+
* Apply unsupervised learning (Clustering, PCA) to identify market volatility regimes.
|
| 13 |
+
* Deploy a user-friendly interactive dashboard using Streamlit.
|
| 14 |
+
* Ensure system reliability through CI/CD pipelines and automated testing.
|
| 15 |
+
|
| 16 |
+
---
|
| 17 |
|
| 18 |
## 2. System Architecture
|
| 19 |
+
The system follows a modular microservices-like architecture, ensuring scalability and maintainability.
|
| 20 |
+
|
| 21 |
+
### 2.1 Core Components
|
| 22 |
+
* **Frontend (User Interface):** Built with **Streamlit**, providing an interactive dashboard for users to select stocks, view real-time metrics, and visualize predictions.
|
| 23 |
+
* **Backend & Orchestration:**
|
| 24 |
+
* **Prefect:** Orchestrates the entire ML workflow, from data ingestion to model inference, ensuring reproducible and scheduled runs.
|
| 25 |
+
* **FastAPI:** (Integrated) Serves as the backend framework for handling API requests and model serving.
|
| 26 |
+
* **Data Layer:**
|
| 27 |
+
* **Alpha Vantage API:** The primary source for real-time and historical stock market data (Daily Time Series).
|
| 28 |
+
* **Local Storage/Database:** Stores raw CSVs and processed datasets for training and inference.
|
| 29 |
+
* **Notification Service:** A custom Discord notification module maintains system observability, alerting administrators of pipeline status or errors, featuring a custom DNS bypass for restricted network environments.
|
| 30 |
+
|
| 31 |
+
### 2.2 Infrastructure & DevOps
|
| 32 |
+
* **Docker:** The entire application is containerized using Docker to ensure consistent environments across development and production.
|
| 33 |
+
* **CI/CD Pipeline:** Hosted on **GitHub Actions**, the pipeline automatically tests the code (pytest, ruff) and deploys changes.
|
| 34 |
+
* **Deployment:** The application is deployed on **Hugging Face Spaces**, providing a publicly accessible interface.
|
| 35 |
+
|
| 36 |
+
---
|
| 37 |
|
| 38 |
## 3. Methodology
|
| 39 |
+
|
| 40 |
+
### 3.1 Data Ingestion
|
| 41 |
+
The system utilizes the `Alpha Vantage` API to fetch daily historical data.
|
| 42 |
+
* **Source:** `src/ingestion/ingest.py`
|
| 43 |
+
* **Process:** The `fetch_daily_data` function retrieves `TIME_SERIES_DAILY` in CSV format, capturing open, high, low, close, and volume data for the last 100 data points (compact mode) or full history.
|
| 44 |
+
|
| 45 |
+
### 3.2 Feature Engineering
|
| 46 |
+
Raw data is transformed into meaningful features to capture market momentum and trends.
|
| 47 |
+
* **Source:** `src/processing/features.py`
|
| 48 |
+
* **Key Indicators:**
|
| 49 |
+
* **Simple Moving Average (SMA):** Calculated for 20-day and 50-day windows to identify trend direction.
|
| 50 |
+
* **Relative Strength Index (RSI):** A 14-day momentum oscillator to detect overbought or oversold conditions.
|
| 51 |
+
* **MACD (Moving Average Convergence Divergence):** Captures changes in the strength, direction, momentum, and duration of a trend.
|
| 52 |
+
* **Lagged Features:** (Implicit in time-series modeling) used to predict future values.
|
| 53 |
+
* **Target Variables:**
|
| 54 |
+
* `target_direction`: Binary classification (1 if Price goes Up, 0 if Down).
|
| 55 |
+
* `target_price`: Regression target (Next day's closing price).
|
| 56 |
+
|
| 57 |
+
### 3.3 Machine Learning Models
|
| 58 |
+
The system employs an **Ensemble Learning** strategy to improve generalization and reduce overfitting.
|
| 59 |
+
* **Regression Models:** Predict the exact future price.
|
| 60 |
+
* *Linear Regression:* Captures linear relationships.
|
| 61 |
+
* *Random Forest Regressor:* Handles non-linearities and feature interactions.
|
| 62 |
+
* *Support Vector Regressor (SVR):* Effective in high-dimensional spaces.
|
| 63 |
+
* **Classification Models:** Predict the directional movement (Up/Down).
|
| 64 |
+
* **Unsupervised Learning:**
|
| 65 |
+
* **PCA & Clustering:** Used to analyze market regimes, grouping market states based on volatility and price action patterns (e.g., "High Volatility", "Bullish Trend").
|
| 66 |
+
|
| 67 |
+
### 3.4 Data Validation
|
| 68 |
+
To ensure data quality and model reliability, the system integrates **DeepChecks**.
|
| 69 |
+
* **Data Integrity:** automated checks for missing values, duplicates, and conflicting labels.
|
| 70 |
+
* **Drift Detection:** Validates that the training and testing data distributions remain consistent (`train_test_validation`), alerting to potential concept drift.
|
| 71 |
+
|
| 72 |
+
---
|
| 73 |
+
|
| 74 |
+
## 4. Implementation & Testing
|
| 75 |
+
|
| 76 |
+
### 4.1 Development
|
| 77 |
+
The project is structured within the `src/` directory, separating concerns into `ingestion`, `processing`, `models`, and `orchestration`.
|
| 78 |
+
|
| 79 |
+
### 4.2 Quality Assurance
|
| 80 |
+
* **Unit Testing:** Implemented using `pytest` (located in `tests/`) to verify individual components.
|
| 81 |
+
* **Data Validation:** Integrated **DeepChecks** to perform automated integrity checks and detect data drift between training and testing datasets.
|
| 82 |
+
* **Linting:** Code quality is maintained using `ruff` to enforce PEP 8 standards.
|
| 83 |
+
* **Automated Workflows:**
|
| 84 |
+
* `ci.yml`: Triggers on push/pull request to `main`, running tests and linter.
|
| 85 |
+
* `deploy_to_hf.yml`: Automatically syncs the repository to Hugging Face Spaces upon successful merge.
|
| 86 |
+
|
| 87 |
+
---
|
| 88 |
+
|
| 89 |
+
## 5. Results & Conclusion
|
| 90 |
+
The system successfully demonstrates a complete end-to-end ML lifecycle. The Streamlit dashboard provides a seamless user experience, allowing for real-time stock analysis. The integration of Discord notifications ensures that the system is monitored effectively.
|
| 91 |
+
|
| 92 |
+
### 5.1 Key Achievements
|
| 93 |
+
* Fully automated data pipeline.
|
| 94 |
+
* Robust ensemble model implementation.
|
| 95 |
+
* Resilient deployment on Hugging Face Spaces.
|
| 96 |
+
* High code quality standards enforced via CI/CD.
|
| 97 |
+
|
| 98 |
+
This project serves as a comprehensive template for scalable financial machine learning applications.
|
docs/project_report.tex
ADDED
|
@@ -0,0 +1,145 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
\documentclass[conference]{IEEEtran}
|
| 2 |
+
\IEEEoverridecommandlockouts
|
| 3 |
+
% The preceding line is only needed to identify funding in the first footnote. if that is unneeded, please comment it out.
|
| 4 |
+
\usepackage{cite}
|
| 5 |
+
\usepackage{amsmath,amssymb,amsfonts}
|
| 6 |
+
\usepackage{algorithmic}
|
| 7 |
+
\usepackage{graphicx}
|
| 8 |
+
\usepackage{textcomp}
|
| 9 |
+
\usepackage{xcolor}
|
| 10 |
+
\def\BibTeX{{\rm B\kern-.05em{\sc i\kern-.025em b}\kern-.08em
|
| 11 |
+
T\kern-.1667em\lower.7ex\hbox{E}\kern-.125emX}}
|
| 12 |
+
\begin{document}
|
| 13 |
+
|
| 14 |
+
\title{AI Stock Prediction \& Analysis System}
|
| 15 |
+
|
| 16 |
+
\author{\IEEEauthorblockN{Muhammad Umer Farooq}
|
| 17 |
+
\IEEEauthorblockA{\textit{Faculty of Computer Science and Engineering} \\
|
| 18 |
+
\textit{Ghulam Ishaq Khan Institute of Engineering Sciences and Technology}\\
|
| 19 |
+
Topi, Pakistan \\
|
| 20 |
+
u2023540@giki.edu.pk}
|
| 21 |
+
}
|
| 22 |
+
|
| 23 |
+
\maketitle
|
| 24 |
+
|
| 25 |
+
\begin{abstract}
|
| 26 |
+
The AI Stock Prediction \& Analysis System aka *Stockker* is an end-to-end machine learning solution designed to predict stock market prices in real-time. By leveraging a combination of ensemble machine learning models (Linear Regression, Random Forest, SVR) and unsupervised learning techniques (PCA, Clustering), the system provides users with assistance in investing in stocks. The system includes a simple Streamlit frontend, a Prefect-orchestrated backend running on FastAPI, and a robust CI/CD pipeline deployed on Hugging Face Spaces.
|
| 27 |
+
\end{abstract}
|
| 28 |
+
|
| 29 |
+
\begin{IEEEkeywords}
|
| 30 |
+
Stock Prediction, Machine Learning, Ensemble Learning, Real-time Analysis, MLOps, Streamlit, Prefect, DevOps
|
| 31 |
+
\end{IEEEkeywords}
|
| 32 |
+
|
| 33 |
+
\section{Introduction}
|
| 34 |
+
Stock market prediction is inherently challenging due to the stochastic nature of financial data. Traditional methods often fail to capture complex non-linear patterns or adapt to changing market conditions. This project aims to address these challenges by building a robust, automated pipeline that integrates real-time data ingestion, advanced feature engineering, and ensemble modeling to improve prediction accuracy and market understanding.
|
| 35 |
+
|
| 36 |
+
The primary objectives of this system are to:
|
| 37 |
+
\begin{itemize}
|
| 38 |
+
\item Develop an automated pipeline for fetching and processing daily stock data.
|
| 39 |
+
\item Implement ensemble learning models for robust price prediction.
|
| 40 |
+
\item Apply unsupervised learning to identify market volatility regimes.
|
| 41 |
+
\item Deploy a user-friendly interactive dashboard.
|
| 42 |
+
\item Ensure system reliability through strictly automated testing and CI/CD pipelines.
|
| 43 |
+
\end{itemize}
|
| 44 |
+
|
| 45 |
+
\section{System Architecture}
|
| 46 |
+
The system follows a modular microservices-like architecture, ensuring scalability and maintainability.
|
| 47 |
+
|
| 48 |
+
\subsection{Core Components}
|
| 49 |
+
\subsubsection{Frontend (User Interface)}
|
| 50 |
+
Built with \textbf{Streamlit}, the frontend provides an interactive dashboard for users to select stocks, view real-time metrics, and visualize predictions. It serves as the primary consumption layer for the model's outputs.
|
| 51 |
+
|
| 52 |
+
\subsubsection{Backend \& Orchestration}
|
| 53 |
+
\begin{itemize}
|
| 54 |
+
\item \textbf{Prefect:} Orchestrates the entire ML workflow, from data ingestion to model inference, ensuring reproducible and scheduled runs.
|
| 55 |
+
\item \textbf{FastAPI:} Serves as the backend framework for handling API requests and serving the model predictions.
|
| 56 |
+
\end{itemize}
|
| 57 |
+
|
| 58 |
+
\subsubsection{Data Layer}
|
| 59 |
+
\begin{itemize}
|
| 60 |
+
\item \textbf{Alpha Vantage API:} The primary source for real-time and historical stock market data (Daily Time Series).
|
| 61 |
+
\item \textbf{Local Storage/Database:} Stores raw CSVs and processed datasets for training and inference, managing the data lifecycle.
|
| 62 |
+
\end{itemize}
|
| 63 |
+
|
| 64 |
+
\subsubsection{Notification Service}
|
| 65 |
+
A custom Discord notification module maintains system observability, alerting administrators of pipeline status or errors. It features a custom DNS bypass to ensure connectivity in restricted network environments such as Hugging Face Spaces.
|
| 66 |
+
|
| 67 |
+
\subsection{Infrastructure \& DevOps}
|
| 68 |
+
\begin{itemize}
|
| 69 |
+
\item \textbf{Docker:} The entire application is containerized to ensure consistent environments across development and production.
|
| 70 |
+
\item \textbf{CI/CD Pipeline:} Hosted on \textbf{GitHub Actions}, the pipeline automatically runs unit tests (pytest), linting (ruff), and performs continuous deployment.
|
| 71 |
+
\item \textbf{Deployment:} The application is deployed on \textbf{Hugging Face Spaces}, providing a publicly accessible and scalable interface.
|
| 72 |
+
\end{itemize}
|
| 73 |
+
|
| 74 |
+
\section{Methodology}
|
| 75 |
+
|
| 76 |
+
\subsection{Data Ingestion}
|
| 77 |
+
The system utilizes the \texttt{Alpha Vantage} API to fetch daily historical data. The ingestion module (\texttt{src/ingestion/ingest.py}) retrieves \texttt{TIME\_SERIES\_DAILY} data in CSV format, capturing open, high, low, close, and volume metrics. It supports both compact mode (last 100 data points) and full historical fetch.
|
| 78 |
+
|
| 79 |
+
\subsection{Feature Engineering}
|
| 80 |
+
Raw data is transformed into meaningful features to capture market momentum and trends (\texttt{src/processing/features.py}). Key indicators include:
|
| 81 |
+
\begin{itemize}
|
| 82 |
+
\item \textbf{Simple Moving Average (SMA):} Calculated for 20-day and 50-day windows to identify trend direction.
|
| 83 |
+
\item \textbf{Relative Strength Index (RSI):} A 14-day momentum oscillator to detect overbought or oversold conditions.
|
| 84 |
+
\item \textbf{MACD:} Captures changes in trend strength, direction, momentum, and duration.
|
| 85 |
+
\item \textbf{Lagged Features:} Implicitly used in time-series modeling to predict future values based on past performance.
|
| 86 |
+
\end{itemize}
|
| 87 |
+
|
| 88 |
+
Target variables include \texttt{target\_direction} (Binary classification: Up/Down) and \texttt{target\_price} (Regression: Next day's closing price).
|
| 89 |
+
|
| 90 |
+
\subsection{Machine Learning Models}
|
| 91 |
+
The system employs an \textbf{Ensemble Learning} strategy to improve generalization and reduce overfitting.
|
| 92 |
+
|
| 93 |
+
\subsubsection{Regression Models}
|
| 94 |
+
Used to predict the exact future price (\texttt{target\_price}). The system utilizes a \textbf{Voting Regressor} that combines the predictions of three distinct base learners:
|
| 95 |
+
\begin{itemize}
|
| 96 |
+
\item \textbf{Linear Regression:} Captures base linear relationships in the data.
|
| 97 |
+
\item \textbf{Random Forest Regressor:} Handles non-linearities and feature interactions effectively (100 estimators).
|
| 98 |
+
\item \textbf{Support Vector Regressor (SVR):} Effective in high-dimensional spaces using the RBF kernel.
|
| 99 |
+
\end{itemize}
|
| 100 |
+
|
| 101 |
+
\subsubsection{Classification Models}
|
| 102 |
+
Used to predict the directional movement of the stock price (\texttt{target\_direction}). A \textbf{Voting Classifier} (Soft Voting) aggregates the probabilities from:
|
| 103 |
+
\begin{itemize}
|
| 104 |
+
\item \textbf{Random Forest Classifier:} A robust ensemble method (100 estimators).
|
| 105 |
+
\item \textbf{Support Vector Classifier (SVC):} Configured with probability estimates to contribute to the soft voting mechanism.
|
| 106 |
+
\end{itemize}
|
| 107 |
+
|
| 108 |
+
\subsubsection{Unsupervised Learning}
|
| 109 |
+
\begin{itemize}
|
| 110 |
+
\item \textbf{Clustering (K-Means):} Applied to identifying market regimes based on \textit{volatility} (rolling standard deviation) and \textit{RSI}. This groups the market into 3 distinct clusters (e.g., Low, Medium, High Volatility).
|
| 111 |
+
\item \textbf{PCA (Principal Component Analysis):} Reduces the dimensionality of the feature set (\texttt{sma\_20}, \texttt{sma\_50}, \texttt{rsi}, \texttt{macd}) into 2 principal components for visualization and analysis.
|
| 112 |
+
\end{itemize}
|
| 113 |
+
|
| 114 |
+
\subsection{Data Validation}
|
| 115 |
+
To ensure data quality and model reliability, the system integrates \textbf{DeepChecks}, performing automated checks for missing values, duplicates, and conflicting labels. It also monitors for data drift between training and production environments.
|
| 116 |
+
|
| 117 |
+
\section{Implementation \& Testing}
|
| 118 |
+
|
| 119 |
+
\subsection{Development}
|
| 120 |
+
The project fits within a structured \texttt{src/} directory, strictly checking separation of concerns between \texttt{ingestion}, \texttt{processing}, \texttt{models}, and \texttt{orchestration} modules.
|
| 121 |
+
|
| 122 |
+
\subsection{Quality Assurance}
|
| 123 |
+
\begin{itemize}
|
| 124 |
+
\item \textbf{Unit Testing:} Implemented using \texttt{pytest} in the \texttt{tests/} directory.
|
| 125 |
+
\item \textbf{Linting:} Enforced via \texttt{ruff} for PEP 8 compliance.
|
| 126 |
+
\item \textbf{Automated Workflows:}
|
| 127 |
+
\begin{itemize}
|
| 128 |
+
\item \texttt{ci.yml}: Triggers on push/pull requests to run tests and linters.
|
| 129 |
+
\item \texttt{deploy\_to\_hf.yml}: Syncs the repository to Hugging Face Spaces upon successful merge.
|
| 130 |
+
\end{itemize}
|
| 131 |
+
\end{itemize}
|
| 132 |
+
|
| 133 |
+
\section{Results \& Conclusion}
|
| 134 |
+
The generic AI Stock Prediction \& Analysis System successfully demonstrates a complete end-to-end ML lifecycle. The Streamlit dashboard offers a seamless user experience for real-time analysis, while the backend orchestration ensures data freshness and model reliability.
|
| 135 |
+
|
| 136 |
+
Key achievements include a fully automated data pipeline, robust ensemble model implementation (both regression and classification), and resilient deployment on Hugging Face Spaces. The integration of Discord notifications provides critical observability. This project serves as a comprehensive template for scalable financial machine learning applications.
|
| 137 |
+
|
| 138 |
+
\begin{thebibliography}{00}
|
| 139 |
+
\bibitem{b1} Alpha Vantage, ``Alpha Vantage API Documentation,'' https://www.alphavantage.co/documentation/.
|
| 140 |
+
\bibitem{b2} Streamlit, ``Streamlit Documentation,'' https://docs.streamlit.io/.
|
| 141 |
+
\bibitem{b3} Prefect, ``Prefect Core Documentation,'' https://docs.prefect.io/.
|
| 142 |
+
\bibitem{b4} Hugging Face, ``Hugging Face Spaces,'' https://huggingface.co/docs/hub/spaces.
|
| 143 |
+
\end{thebibliography}
|
| 144 |
+
|
| 145 |
+
\end{document}
|
src/api/main.py
CHANGED
|
@@ -12,6 +12,7 @@ app = FastAPI(title="Stock Prediction API", version="1.0.0")
|
|
| 12 |
models = {}
|
| 13 |
|
| 14 |
class PredictionInput(BaseModel):
|
|
|
|
| 15 |
sma_20: float
|
| 16 |
sma_50: float
|
| 17 |
rsi: float
|
|
@@ -20,32 +21,54 @@ class PredictionInput(BaseModel):
|
|
| 20 |
class PredictionOutput(BaseModel):
|
| 21 |
prediction: float
|
| 22 |
model_type: str
|
|
|
|
| 23 |
|
| 24 |
@app.on_event("startup")
|
| 25 |
def load_models():
|
| 26 |
"""Load models on startup."""
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
if os.path.exists(clf_path):
|
| 44 |
-
models['classification'] = joblib.load(clf_path)
|
| 45 |
-
print(f"Loaded classification model from {clf_path}")
|
| 46 |
|
| 47 |
-
|
| 48 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 49 |
|
| 50 |
@app.get("/health")
|
| 51 |
def health_check():
|
|
@@ -53,21 +76,34 @@ def health_check():
|
|
| 53 |
|
| 54 |
@app.post("/predict/price", response_model=PredictionOutput)
|
| 55 |
def predict_price(input_data: PredictionInput):
|
| 56 |
-
|
| 57 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 58 |
|
| 59 |
features = [[input_data.sma_20, input_data.sma_50, input_data.rsi, input_data.macd]]
|
| 60 |
-
prediction = models[
|
| 61 |
-
return {"prediction": prediction, "model_type": "
|
| 62 |
|
| 63 |
@app.post("/predict/direction", response_model=PredictionOutput)
|
| 64 |
def predict_direction(input_data: PredictionInput):
|
| 65 |
-
|
| 66 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 67 |
|
| 68 |
features = [[input_data.sma_20, input_data.sma_50, input_data.rsi, input_data.macd]]
|
| 69 |
-
prediction = models[
|
| 70 |
-
return {"prediction": float(prediction), "model_type": "
|
| 71 |
|
| 72 |
@app.post("/predict/batch")
|
| 73 |
async def predict_batch(file: UploadFile = File(...)):
|
|
|
|
| 12 |
models = {}
|
| 13 |
|
| 14 |
class PredictionInput(BaseModel):
|
| 15 |
+
symbol: str = "AAPL"
|
| 16 |
sma_20: float
|
| 17 |
sma_50: float
|
| 18 |
rsi: float
|
|
|
|
| 21 |
class PredictionOutput(BaseModel):
|
| 22 |
prediction: float
|
| 23 |
model_type: str
|
| 24 |
+
symbol: str
|
| 25 |
|
| 26 |
@app.on_event("startup")
|
| 27 |
def load_models():
|
| 28 |
"""Load models on startup."""
|
| 29 |
+
from pathlib import Path
|
| 30 |
+
|
| 31 |
+
BASE_DIR = Path(__file__).resolve().parent.parent.parent
|
| 32 |
+
model_dir = BASE_DIR / "models"
|
| 33 |
+
|
| 34 |
+
print(f"Loading models from: {model_dir}")
|
| 35 |
+
|
| 36 |
+
if not model_dir.exists():
|
| 37 |
+
print(f"Models directory not found at {model_dir}")
|
| 38 |
+
return
|
| 39 |
+
|
| 40 |
+
# iterate over subdirs (symbols)
|
| 41 |
+
for symbol_dir in model_dir.iterdir():
|
| 42 |
+
if symbol_dir.is_dir():
|
| 43 |
+
symbol = symbol_dir.name
|
| 44 |
+
print(f"Found symbol directory: {symbol}")
|
|
|
|
|
|
|
|
|
|
| 45 |
|
| 46 |
+
# Load Regression
|
| 47 |
+
reg_path = symbol_dir / "regression_model.pkl"
|
| 48 |
+
if reg_path.exists():
|
| 49 |
+
try:
|
| 50 |
+
key = f"regression_{symbol}"
|
| 51 |
+
models[key] = joblib.load(reg_path)
|
| 52 |
+
print(f"Loaded {key} from {reg_path}")
|
| 53 |
+
|
| 54 |
+
# Keep legacy 'regression' key pointing to AAPL for backward compat
|
| 55 |
+
if 'regression' not in models or symbol == "AAPL":
|
| 56 |
+
models['regression'] = models[key]
|
| 57 |
+
except Exception as e:
|
| 58 |
+
print(f"Failed to load {reg_path}: {e}")
|
| 59 |
+
|
| 60 |
+
# Load Classification
|
| 61 |
+
clf_path = symbol_dir / "classification_model.pkl"
|
| 62 |
+
if clf_path.exists():
|
| 63 |
+
try:
|
| 64 |
+
key = f"classification_{symbol}"
|
| 65 |
+
models[key] = joblib.load(clf_path)
|
| 66 |
+
print(f"Loaded {key} from {clf_path}")
|
| 67 |
+
|
| 68 |
+
if 'classification' not in models or symbol == "AAPL":
|
| 69 |
+
models['classification'] = models[key]
|
| 70 |
+
except Exception as e:
|
| 71 |
+
print(f"Failed to load {clf_path}: {e}")
|
| 72 |
|
| 73 |
@app.get("/health")
|
| 74 |
def health_check():
|
|
|
|
| 76 |
|
| 77 |
@app.post("/predict/price", response_model=PredictionOutput)
|
| 78 |
def predict_price(input_data: PredictionInput):
|
| 79 |
+
symbol = input_data.symbol
|
| 80 |
+
model_key = f"regression_{symbol}"
|
| 81 |
+
|
| 82 |
+
# Fallback to generic 'regression' if specific symbol not found
|
| 83 |
+
if model_key not in models:
|
| 84 |
+
if 'regression' in models:
|
| 85 |
+
model_key = 'regression'
|
| 86 |
+
else:
|
| 87 |
+
raise HTTPException(status_code=503, detail=f"Regression model for {symbol} not loaded")
|
| 88 |
|
| 89 |
features = [[input_data.sma_20, input_data.sma_50, input_data.rsi, input_data.macd]]
|
| 90 |
+
prediction = models[model_key].predict(features)[0]
|
| 91 |
+
return {"prediction": prediction, "model_type": str(type(models[model_key])), "symbol": symbol}
|
| 92 |
|
| 93 |
@app.post("/predict/direction", response_model=PredictionOutput)
|
| 94 |
def predict_direction(input_data: PredictionInput):
|
| 95 |
+
symbol = input_data.symbol
|
| 96 |
+
model_key = f"classification_{symbol}"
|
| 97 |
+
|
| 98 |
+
if model_key not in models:
|
| 99 |
+
if 'classification' in models:
|
| 100 |
+
model_key = 'classification'
|
| 101 |
+
else:
|
| 102 |
+
raise HTTPException(status_code=503, detail=f"Classification model for {symbol} not loaded")
|
| 103 |
|
| 104 |
features = [[input_data.sma_20, input_data.sma_50, input_data.rsi, input_data.macd]]
|
| 105 |
+
prediction = models[model_key].predict(features)[0]
|
| 106 |
+
return {"prediction": float(prediction), "model_type": str(type(models[model_key])), "symbol": symbol}
|
| 107 |
|
| 108 |
@app.post("/predict/batch")
|
| 109 |
async def predict_batch(file: UploadFile = File(...)):
|
src/ingestion/ingest.py
CHANGED
|
@@ -45,9 +45,12 @@ def fetch_daily_data(symbol: str, output_dir: str = "data/raw"):
|
|
| 45 |
return file_path
|
| 46 |
|
| 47 |
if __name__ == "__main__":
|
| 48 |
-
# Example usage
|
| 49 |
-
|
| 50 |
-
|
| 51 |
-
|
| 52 |
-
|
| 53 |
-
|
|
|
|
|
|
|
|
|
|
|
|
| 45 |
return file_path
|
| 46 |
|
| 47 |
if __name__ == "__main__":
|
| 48 |
+
# Example usage for manual execution
|
| 49 |
+
symbols = ["AAPL", "GOOGL", "MSFT", "AMZN", "TSLA", "NVDA"]
|
| 50 |
+
print(f"Manually fetching data for: {symbols}")
|
| 51 |
+
|
| 52 |
+
for symbol in symbols:
|
| 53 |
+
try:
|
| 54 |
+
fetch_daily_data(symbol)
|
| 55 |
+
except Exception as e:
|
| 56 |
+
print(f"Error fetching {symbol}: {e}")
|
src/orchestration/flows.py
CHANGED
|
@@ -49,7 +49,7 @@ def train_and_evaluate(df: pd.DataFrame, symbol: str):
|
|
| 49 |
return True
|
| 50 |
|
| 51 |
@flow(name="End-to-End Stock Prediction Pipeline")
|
| 52 |
-
def main_pipeline(symbols: list[str] = ["AAPL", "GOOGL"]):
|
| 53 |
"""Main flow to run the entire pipeline."""
|
| 54 |
notify_discord("π Starting End-to-End Pipeline...")
|
| 55 |
|
|
|
|
| 49 |
return True
|
| 50 |
|
| 51 |
@flow(name="End-to-End Stock Prediction Pipeline")
|
| 52 |
+
def main_pipeline(symbols: list[str] = ["AAPL", "GOOGL", "MSFT", "AMZN", "TSLA", "NVDA"]):
|
| 53 |
"""Main flow to run the entire pipeline."""
|
| 54 |
notify_discord("π Starting End-to-End Pipeline...")
|
| 55 |
|
streamlit_app.py
CHANGED
|
@@ -16,7 +16,7 @@ load_dotenv()
|
|
| 16 |
|
| 17 |
# --- Config ---
|
| 18 |
st.set_page_config(page_title="Stock Prediction System", layout="wide", page_icon="π")
|
| 19 |
-
|
| 20 |
|
| 21 |
# --- Secrets ---
|
| 22 |
# Try to get from st.secrets (Cloud) or os.getenv (Local)
|
|
@@ -25,17 +25,24 @@ WEBHOOK_URL = os.getenv("WEBHOOK_URL")
|
|
| 25 |
|
| 26 |
# --- Helper Functions ---
|
| 27 |
@st.cache_resource
|
| 28 |
-
def load_models_local():
|
| 29 |
-
"""Loads models directly from disk
|
|
|
|
| 30 |
models = {}
|
| 31 |
try:
|
| 32 |
-
models['regression'] = joblib.load(f"{
|
| 33 |
-
models['classification'] = joblib.load(f"{
|
| 34 |
-
|
| 35 |
-
models['pca'] = joblib.load(f"{MODEL_DIR}/pca_model.pkl")
|
| 36 |
return models
|
| 37 |
except Exception as e:
|
| 38 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 39 |
return None
|
| 40 |
|
| 41 |
from src.orchestration.notifications import notify_discord
|
|
@@ -169,7 +176,7 @@ st.markdown("---")
|
|
| 169 |
st.subheader(f"π€ AI Analysis for {symbol}")
|
| 170 |
|
| 171 |
features = np.array([[data['sma_20'], data['sma_50'], data['rsi'], data['macd']]])
|
| 172 |
-
models = load_models_local()
|
| 173 |
|
| 174 |
if models:
|
| 175 |
col_pred1, col_pred2 = st.columns(2)
|
|
|
|
| 16 |
|
| 17 |
# --- Config ---
|
| 18 |
st.set_page_config(page_title="Stock Prediction System", layout="wide", page_icon="π")
|
| 19 |
+
# MODEL_DIR removed (Dynamic loading now used)
|
| 20 |
|
| 21 |
# --- Secrets ---
|
| 22 |
# Try to get from st.secrets (Cloud) or os.getenv (Local)
|
|
|
|
| 25 |
|
| 26 |
# --- Helper Functions ---
|
| 27 |
@st.cache_resource
|
| 28 |
+
def load_models_local(symbol):
|
| 29 |
+
"""Loads models directly from disk for the specific symbol."""
|
| 30 |
+
model_path = f"models/{symbol}"
|
| 31 |
models = {}
|
| 32 |
try:
|
| 33 |
+
models['regression'] = joblib.load(f"{model_path}/regression_model.pkl")
|
| 34 |
+
models['classification'] = joblib.load(f"{model_path}/classification_model.pkl")
|
| 35 |
+
# specific clustering/pca models might be needed too if visualizing
|
|
|
|
| 36 |
return models
|
| 37 |
except Exception as e:
|
| 38 |
+
# Fallback to AAPL if specific model missing (for robustness)
|
| 39 |
+
if symbol != "AAPL":
|
| 40 |
+
try:
|
| 41 |
+
# st.warning(f"Models for {symbol} not found. Using AAPL logic transfer.")
|
| 42 |
+
return load_models_local("AAPL")
|
| 43 |
+
except:
|
| 44 |
+
pass
|
| 45 |
+
st.error(f"Failed to load models for {symbol}: {e}")
|
| 46 |
return None
|
| 47 |
|
| 48 |
from src.orchestration.notifications import notify_discord
|
|
|
|
| 176 |
st.subheader(f"π€ AI Analysis for {symbol}")
|
| 177 |
|
| 178 |
features = np.array([[data['sma_20'], data['sma_50'], data['rsi'], data['macd']]])
|
| 179 |
+
models = load_models_local(symbol)
|
| 180 |
|
| 181 |
if models:
|
| 182 |
col_pred1, col_pred2 = st.columns(2)
|
verify_load.py
ADDED
|
@@ -0,0 +1,38 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
|
| 2 |
+
import joblib
|
| 3 |
+
import os
|
| 4 |
+
from pathlib import Path
|
| 5 |
+
|
| 6 |
+
def test_load():
|
| 7 |
+
# Simulate the logic in main.py
|
| 8 |
+
# We are running this script from ROOT, so we need to construct the path
|
| 9 |
+
# as if we were in src/api/main.py to test that specific logic,
|
| 10 |
+
# OR provided we know the structure, just test access to the models dir.
|
| 11 |
+
|
| 12 |
+
# Let's test the ACTUAL logic we put in main.py.
|
| 13 |
+
# We will assume this script is placed at src/api/debug_load.py to match depth
|
| 14 |
+
# But I will write it to root and adjust logic for testing purposes,
|
| 15 |
+
# OR just write it to src/api/verify_load.py
|
| 16 |
+
pass
|
| 17 |
+
|
| 18 |
+
if __name__ == "__main__":
|
| 19 |
+
# We will assume this file is at ROOT/verify_load.py
|
| 20 |
+
# So ROOT is just Path(__file__).parent
|
| 21 |
+
|
| 22 |
+
ROOT_DIR = Path(__file__).resolve().parent
|
| 23 |
+
models_dir = ROOT_DIR / "models"
|
| 24 |
+
|
| 25 |
+
print(f"Checking models dir: {models_dir}")
|
| 26 |
+
|
| 27 |
+
symbol = "AAPL"
|
| 28 |
+
reg_path = models_dir / symbol / "regression_model.pkl"
|
| 29 |
+
|
| 30 |
+
if reg_path.exists():
|
| 31 |
+
print(f"FOUND: {reg_path}")
|
| 32 |
+
try:
|
| 33 |
+
model = joblib.load(reg_path)
|
| 34 |
+
print("SUCCESS: Model loaded correctly.")
|
| 35 |
+
except Exception as e:
|
| 36 |
+
print(f"FAILURE: Model found but failed to load: {e}")
|
| 37 |
+
else:
|
| 38 |
+
print(f"FAILURE: Model file not found at {reg_path}")
|