Spaces:

DanielKiani
/

Portfolio-Optimization-with-Deep-Reinforcement-Learning

Sleeping

App Files Files Community

DanielKiani commited on Dec 2, 2025

Commit

349ad65

1 Parent(s): 1b637c6

Version 1.0 release

Browse files

Files changed (18) hide show

README.md +174 -94
requirements.txt +23 -14
results/final_performance_comparison_all_agents.png +2 -2
results/{td3_portfolio_alocation.png → ppo_allocation.png} +2 -2
results/sac_allocation.png +3 -0
results/{ppo_portfolio_alocation.png → td3_allocation.png} +2 -2
results/{sac_portfolio_alocation.png → td3_transformer_allocation.png} +2 -2
scripts/app.py +662 -0
scripts/custom_policy.py +80 -0
scripts/environment.py +52 -94
scripts/evaluate.py +62 -51
scripts/evaluate_baselines.py +50 -29
scripts/fetch_data.py +0 -75
scripts/fetch_market_data.py +90 -64
scripts/llm_analysis_rag.py +243 -0
scripts/predict_tomorrow.py +123 -0
scripts/tune_sac.py +198 -0
scripts/visualize_strategy.py +0 -123

README.md CHANGED Viewed

@@ -1,11 +1,21 @@
 ![Banner](assets/banner.png)
 [![Python](https://img.shields.io/badge/Python-3.12.11-blue?logo=python)](https://www.python.org/)[![PyTorch](https://img.shields.io/badge/PyTorch-2.8-EE4C2C?logo=pytorch)](https://pytorch.org/)![Made with ML](https://img.shields.io/badge/Made%20with-ML-blueviolet?logo=openai)[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
-# 🤖 Portfolio Optimization with Deep Reinforcement Learning
-This project explores the use of Deep Reinforcement Learning to train autonomous agents for financial portfolio management. The goal was not just to create a single profitable agent, but to conduct a comparative study of different RL algorithms (PPO, SAC, TD3) to understand the emergent trading strategies and their robustness across various market conditions.
-**The ultimate finding? A TD3-based agent learned a superior, risk-managed static asset allocation that consistently outperformed both active trading strategies and aggressive growth models, especially during market downturns.**
 ---
@@ -13,16 +23,12 @@ This project explores the use of Deep Reinforcement Learning to train autonomous
 1. [📊 The Data & Asset Selection](#-the-data--asset-selection)
 2. [🎯 Benchmarking Against Baselines](#-benchmarking-against-baselines)
-3. [🏆 Key Findings & The Champion Agent](#-key-findings--the-champion-agent)
-4. [🧠 Comparative Analysis of Agent Strategies](#-comparative-analysis-of-agent-strategies)
-    * [🥇 TD3: The Prudent Risk-Manager](#-td3-the-prudent-risk-manager)
-    * [🚀 SAC: The Aggressive Growth Engine](#-sac-the-aggressive-growth-engine)
-    * [📈 PPO: The Active (but Inconsistent) Trader](#-ppo-the-active-but-inconsistent-trader)
-5. [🌪️ Stress Testing: The Ultimate Test of Robustness](#️-stress-testing-the-ultimate-test-of-robustness)
-6. [🔬 The Research Journey: Why Simplicity Won](#-the-research-journey-why-simplicity-won)
-7. [✅ Conclusion](#-conclusion)
-8. [📂 Project Structure](#-project-structure)
-9. [🚀 How to Run](#-how-to-run)
     * [Setup](#setup)
     * [Data Fetching](#data-fetching)
     * [Training](#training)
@@ -32,16 +38,23 @@ This project explores the use of Deep Reinforcement Learning to train autonomous
 ## 📊 The Data & Asset Selection
-The foundation of any financial machine learning project is the data. This project uses daily closing price data sourced from **Yahoo Finance** via the `yfinance` library. The primary training period was **2015-2020**, with out-of-sample testing conducted on **2021-2023** and other periods for stress testing.
-The selection of assets was crucial for creating a realistic decision-making environment for the agent. The portfolio consists of five assets, chosen to represent different classes and risk profiles:
 * **Growth Equities (AAPL, MSFT):** Represent the high-growth, high-volatility technology sector.
 * **Market Index (SPY):** An ETF tracking the S&P 500, representing the broader US stock market.
 * **Safe Haven (TLT):** An ETF for 20+ Year US Treasury Bonds, which often acts as a "risk-off" asset during stock market downturns.
 * **Alternative Asset (BTC-USD):** Represents a non-traditional, extremely volatile asset class with high potential returns.
-This diverse mix forces the agent to learn not just about individual assets, but also about their correlations and how to balance risk across different economic regimes.
 ---
@@ -57,73 +70,83 @@ The chart below shows the performance of a simple Buy and Hold strategy during t
 ---
-## 🏆 Key Findings & The Champion Agent
-After extensive training, evaluation, and stress-testing, the **TD3 agent emerged as the clear winner** on a risk-adjusted basis. While other agents achieved higher raw returns, their strategies proved to be brittle and dangerously volatile during market crises. The TD3 agent's strategy was the most robust and reliable.
-#### Final Performance Comparison (2021-2023)
-This table summarizes the performance of the top-performing static agents against the baseline.
-| Metric | **TD3 Agent** | SAC Agent | Buy & Hold |
-| :--- | :--- | :--- | :--- |
-| **Total Return** | 47.24% | **50.89%** | 34.91% |
-| **CAGR** | 13.76% | **14.70%** | 10.50% |
-| **Sharpe Ratio** | **0.62** | 0.51 | 0.45 |
-| **Max Drawdown** | **-28.41%** | -44.61% | -40.81% |
-The TD3 agent delivered strong returns while significantly reducing the maximum drawdown, proving its superior capital preservation strategy.
 ![Main Performance Chart](results/final_performance_comparison_all_agents.png)
----
-## 🧠 Comparative Analysis of Agent Strategies
-A fascinating outcome of this project was observing three different RL algorithms independently discover three distinct and recognizable investment philosophies.
-### 🥇 TD3: The Prudent Risk-Manager
-The TD3 agent concluded that the most effective strategy was not to trade frequently, but to find one **superior, risk-managed static asset allocation** and hold it.
-* **Strategy:** "Smarter Buy and Hold".
-* **Behavior:** The agent's allocation is completely static, indicating it focused on the initial strategic decision and ignored market noise to minimize transaction costs.
-* **Result:** This approach led to the best risk-adjusted returns, proving that a robust initial setup is more valuable than reactive trading.
-![TD3 Allocation Chart](results/td3_portfolio_alocation.png)
-### 🚀 SAC: The Aggressive Growth Engine
-The SAC agent also learned a static allocation strategy, but its portfolio was geared for **maximum growth**, accepting higher risk for higher potential returns.
-* **Strategy:** High-risk, high-return static allocation.
-* **Behavior:** Like TD3, it made one initial allocation and held firm. However, this allocation was far more aggressive.
-* **Result:** It achieved the highest total return in some periods but suffered catastrophic drawdowns in stress tests, making its strategy unreliable and brittle.
-![SAC Performance Chart](results/sac_portfolio_alocation.png)
-### 📈 PPO: The Active (but Inconsistent) Trader
-Unlike the other two, the PPO agent learned an **active, dynamic trading strategy**, constantly adjusting its portfolio based on market conditions.
-* **Strategy:** Tactical asset allocation.
-* **Behavior:** The allocation chart clearly shows the agent rebalancing its portfolio over time, for example, by increasing its bond (TLT) holdings during the 2022 downturn.
-* **Result:** While impressive that it learned this behavior, its performance was inconsistent. It succeeded in some periods (2018) but failed in others (2025), highlighting the immense difficulty of successful market timing.
-![PPO Allocation Chart](results/ppo_portfolio_alocation.png)
----
-## 🌪️ Stress Testing: The Ultimate Test of Robustness
-A model is only as good as its performance during a crisis. We subjected the agents to multiple out-of-sample stress tests, with the 2018 period (featuring a crypto winter and a stock market flash crash) being the most revealing.
-![2018 Stress Test Chart](results/stress_test_comparison_2018.png)
-* **TD3's Triumph:** The orange line shows the TD3 agent successfully navigating the downturn, preserving capital far better than the baseline.
-* **SAC's Failure:** The green line shows the SAC agent's aggressive strategy failing catastrophically, resulting in a massive drawdown.
-This test definitively proved that the **TD3 agent's risk-managed approach was truly robust**, while the SAC agent's strategy was fragile.
 ---
@@ -135,8 +158,39 @@ This project was also an exercise in scientific methodology. We initially hypoth
 * **Hypothesis 2: Models with memory are better.** We tested an LSTM-based agent (`RecurrentPPO`). **Result:** Performance degraded. The added complexity led to overfitting on the training data.
 * **Hypothesis 3: Using Regularization is better.** We tested both L1 and L2 regularization. **Results:** Performance degraded.
 * **Hypothesis 4: Increasing the window from 30 days is better.** We tested increasing the window to 60 days. **Results:** Performance degraded. increasing the context window is not always good and it could be seen as more noise for the model.
-The conclusion was clear: for this problem, a simple and elegant model (a standard MLP fed with just normalized price data) was the most effective.
 ---
@@ -148,68 +202,94 @@ This project successfully demonstrates that Deep Reinforcement Learning can be a
 ## 📂 Project Structure
-The codebase is organized into modular, reusable scripts.
 ```bash
-├── assets/
-├── checkpoints/            # Holds all saved model .zip files
-├── results/                # Holds all output plots and metrics
-├── scripts/
-│   ├── environment.py      # The custom Gymnasium environment for the simulation
-│   ├── fetch_market_data.py# A flexible script to download data for any period
-│   ├── train.py            # The main training script with model selection
-│   ├── evaluate.py         # The main evaluation script for generating metrics
-│   ├── stress_test.py      # Runs a full comparison of all agents on a given dataset
-│   └── visualize_strategy.py # Plots the asset allocation of a single trained agent
-└── README.md                 # This file
 ```
----
 ## 🚀 How to Run
 ### Setup
-1. Clone the repository.
-2. Create and activate a Python virtual environment.
-3. Install the required packages:
-    ```bash
-    pip install -r requirements.txt
-    ```
-### Data Fetching
-Use the flexible `fetch_market_data.py` script to get any data you need.
 ```bash
-# Fetch the default training data (2015-2021)
-python fetch_market_data.py --start 2015-01-01 --end 2020-12-31 --filename data/train.csv
-# Fetch data for a stress test (e.g., 2022)
-python fetch_market_data.py --start 2022-01-01 --end 2022-12-31 --filename data/test_2022.csv
 ```
 ### Training
-Use the `train.py` script to train any of the three main agents.
-```bash
-# Train the champion TD3 agent (default)
-python src/train.py --agent td3
 # Train a SAC agent for more timesteps
-python src/train.py --agent sac --timesteps 100000
 ```
 ### Evaluation & Visualization
-Use the dedicated scripts to analyze the results.
-```bash
-# Run a full stress test on the 2018 data
-python stress_test.py --datafile data/stress_test_2018.csv
-# Visualize the TD3 agent's strategy
-python visualize_strategy.py --agent td3 --checkpoint td3_portfolio_model.zip
 ```

 ![Banner](assets/banner.png)
 [![Python](https://img.shields.io/badge/Python-3.12.11-blue?logo=python)](https://www.python.org/)[![PyTorch](https://img.shields.io/badge/PyTorch-2.8-EE4C2C?logo=pytorch)](https://pytorch.org/)![Made with ML](https://img.shields.io/badge/Made%20with-ML-blueviolet?logo=openai)[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
+# 🤖 Portfolio Optimization with Deep Reinforcement Learning (v1.0)
+This project explores the use of Deep Reinforcement Learning (DRL) to train autonomous agents for financial portfolio management. The goal is to create agents that can dynamically allocate capital across a diverse set of assets to maximize returns while managing risk.
+This is **Version 1.0** of the project, which moves beyond initial exploration to a more robust and comparative study. Building on the foundation of v0.1, this version introduces:
+* **Comparative Analysis:** We train and evaluate three state-of-the-art DRL algorithms: **Proximal Policy Optimization (PPO)**, **Soft Actor-Critic (SAC)**, and **Twin Delayed DDPG (TD3)**. This allows us to understand the different emergent strategies and trade-offs of each approach.
+* **Robust Benchmarking:** Agents' performance is rigorously compared against a standard **Buy and Hold** baseline, using a comprehensive set of financial metrics including Total Return, CAGR, Sharpe Ratio, Sortino Ratio, and Max Drawdown.
+* **Modular Codebase:** The project has been refactored into a clean, modular structure with separate scripts for data fetching, training, evaluation, and visualization, making it easier to understand, extend, and reproduce results.
+* **In-Depth Analysis:** We delve into *why* certain agents perform better, visualizing their asset allocation strategies over time to uncover their "investment philosophy."
+* **Deep RL & LLM Portfolio Manager (Web App):**  A key feature of v1.0 is the interactive web application built with **Gradio**. This dashboard bridges the gap between complex backend models and user-friendly analysis, allowing for live tracking, forward-looking strategy generation, and historical backtesting.
+The dashboard integrates **Large Language Models (LLMs)**, specifically Qwen, to act as an AI Risk Analyst, providing textual justification and risk assessments for the RL agent's proposed strategies.
+*You try the webapp here ->* [Gradio webapp](https://huggingface.co/spaces/DanielKiani/Portfolio-Optimization-with-Deep-Reinforcement-Learning)
 ---
 1. [📊 The Data & Asset Selection](#-the-data--asset-selection)
 2. [🎯 Benchmarking Against Baselines](#-benchmarking-against-baselines)
+3. [🏆 Key Findings & The New Champion](#-key-findings--the-new-champion)
+4. [🔬 The Research Journey: Why Simplicity Won](#-the-research-journey-why-simplicity-won)
+5. [🖥️ Deep RL & LLM Portfolio Manager (Web App)](#️-deep-rl--llm-portfolio-manager-web-app)
+6. [✅ Conclusion](#-conclusion)
+7. [📂 Project Structure](#-project-structure)
+8. [🚀 How to Run](#-how-to-run)
     * [Setup](#setup)
     * [Data Fetching](#data-fetching)
     * [Training](#training)
 ## 📊 The Data & Asset Selection
+The foundation of any financial machine learning project is the data. The primary source for daily closing price data of the portfolio assets is **Yahoo Finance**, accessed via the `yfinance` library.
+To provide the agents with broader economic context beyond just price history, the observation space is augmented with key macroeconomic indicators sourced from **FRED (Federal Reserve Economic Data)**. These indicators include data points such as the CBOE Volatility Index (VIX), various Treasury bill yields, and inflation expectations. This allows the agents to learn strategies that adapt to different market regimes, such as high volatility or rising interest rate environments.
+**Environment & Realistic Constraints:**
+To ensure realistic simulation results, the trading environment incorporates transaction costs.
+* **Transaction Cost:** A fee of **0.001%** is applied to the notional value of every trade (both buys and sells). This forces the agents to learn strategies that generate returns net of fees, discouraging excessive, unprofitable trading.
+The portfolio itself consists of five assets, chosen to represent different asset classes and risk profiles, creating a challenging decision-making environment:
 * **Growth Equities (AAPL, MSFT):** Represent the high-growth, high-volatility technology sector.
 * **Market Index (SPY):** An ETF tracking the S&P 500, representing the broader US stock market.
 * **Safe Haven (TLT):** An ETF for 20+ Year US Treasury Bonds, which often acts as a "risk-off" asset during stock market downturns.
 * **Alternative Asset (BTC-USD):** Represents a non-traditional, extremely volatile asset class with high potential returns.
+This diverse mix forces the agent to learn not just about individual asset price movements, but also about their correlations and how to balance risk across different economic conditions.
 ---
 ---
+## 🏆 Key Findings & The New Champion
+Our latest evaluation on out-of-sample data from **2021-2023** has yielded surprising and significant results, challenging our initial assumptions and highlighting the impact of neural network architecture on agent performance.
+The **TD3 agent powered by a Transformer architecture** has emerged as the undisputed champion in terms of risk-adjusted returns and capital preservation, while the **SAC agent** demonstrated the highest absolute growth potential.
+#### Final Performance Comparison (2021-2023)
+This table summarizes the performance of our key agents against the Buy & Hold baseline.
+| Metric | **TD3 (Transformer)** | SAC (MLP) | Buy & Hold | PPO (MLP) | TD3 (MLP) |
+| :--- | :--- | :--- | :--- | :--- | :--- |
+| **Total Return** | 25.34% | **39.23%** | 32.76% | 22.85% | 22.07% |
+| **CAGR** | 8.20% | **12.25%** | 9.96% | 7.45% | 7.21% |
+| **Sharpe Ratio** | **0.61** | 0.56 | 0.59 | 0.41 | 0.42 |
+| **Volatility** | **14.77%** | 27.47% | 19.06% | 25.90% | 23.00% |
+| **Max Drawdown** | **-20.01%** | -29.08% | -28.82% | -44.26% | -40.50% |
 ![Main Performance Chart](results/final_performance_comparison_all_agents.png)
+***note*: bitcoin was excluded from the performance comparison**
+### 🥇 TD3 (Transformer): The Master of Risk Management
+The most notable finding is the superior performance of the TD3 agent when equipped with a **Transformer-based policy network**. This agent achieved the best risk-adjusted metrics across the board.
+* **Lowest Volatility (14.77%):** It provided a significantly smoother ride than even the passive Buy & Hold baseline.
+* **Best Capital Preservation:** Its maximum drawdown of **-20.01%** was drastically lower than other agents and the baseline, proving its ability to protect capital during severe market downturns like the 2022 bear market.
+* **Conclusion:** The Transformer's attention mechanism likely allowed the agent to better identify and react to long-term market shifts and regime changes, leading to a highly robust and defensive strategy.
+### 🚀 SAC (MLP): The Aggressive Growth Engine
+The **Soft Actor-Critic (SAC)** agent confirmed its role as the high-growth strategist.
+* **Highest Returns:** It achieved the highest Total Return (**39.23%**) and CAGR (**12.25%**), outperforming the Buy & Hold baseline by a significant margin.
+* **Higher Risk:** This performance came at the cost of the highest volatility (**27.47%**), making it a strategy suited for aggressive investors willing to tolerate larger price swings for maximum gain.
+### 📉 The Failure of Standard Architectures
+Interestingly, the standard Multi-Layer Perceptron (MLP) versions of PPO and TD3 failed to beat the simple Buy & Hold baseline. They suffered the lowest returns and the deepest drawdowns. This stark contrast with the success of the Transformer model highlights that for complex financial time-series, **network architecture is just as critical, if not more so, than the choice of RL algorithm itself.**
+---
+## 🧠 Comparative Analysis of Agent Strategies
+A fascinating outcome of this project was observing how different combinations of RL algorithms and network architectures led to distinct investment philosophies. We can visualize this by looking at how each agent allocated its portfolio over time.
+### TD3 (Transformer): The Dynamic Hedger
+The Transformer-based TD3 agent did not learn a static allocation. Instead, it developed a sophisticated, **dynamic hedging strategy**. By leveraging the Transformer's attention mechanism to process the 30-day lookback window, the agent could identify market trends and adapt its portfolio accordingly.
+![TD3 Transformer Allocation Chart](results/td3_transformer_allocation.png)
+As shown in the chart, the agent maintains a core position in equities (AAPL, MSFT, SPY) but actively manages its exposure. During the volatile bear market of 2022, the agent significantly increased its allocation to the safe-haven asset **TLT (US Treasury Bonds)**, effectively "smoothing out" its equity curve and avoiding the deep losses suffered by the baseline. This ability to dynamically shift into defensive assets is the key to its superior risk-adjusted performance.
+### SAC (MLP): The High-Conviction Aggressor
+The SAC agent learned a strategy that is nearly the polar opposite of the Transformer. It converged to a **high-risk, high-return static allocation strategy**. Its portfolio is heavily weighted towards high-growth assets, likely with a substantial allocation to Bitcoin (BTC-USD) and tech stocks, with very little exposure to defensive assets like bonds or cash.
+![SAC Allocation Chart](results/sac_allocation.png)
+The allocation chart reveals a strategy with minimal changes over time, indicating a "set-and-forget" approach. While this high-conviction bet paid off with the highest total return, it also exposed the portfolio to significant volatility.
+### PPO (MLP): The Failed Active Trader
+Unlike the other MLP-based agents which converged to static allocations, the PPO agent attempted a **dynamic, active trading strategy**.
+![PPO Allocation Chart](results/ppo_allocation.png)
+As seen in the chart, the agent frequently rebalances its portfolio, shifting weights between equities, bonds, and cash. However, the performance metrics indicate that this activity was detrimental. With poor returns and the deepest maximum drawdown (-44.26%) among all agents, the PPO agent's attempts at market timing were unsuccessful, churning the portfolio without generating alpha or managing risk.
+### TD3 (MLP): The Failed Static Allocator
+The standard MLP version of the TD3 agent also converged to a static allocation, similar to the SAC agent, but chose a clearly suboptimal portfolio.
+![TD3 MLP Allocation Chart](results/td3_allocation.png)
+The chart shows a relatively fixed allocation that failed to perform well. Unlike the SAC agent, it did not capture high-growth opportunities, and unlike the Transformer agent, it lacked the dynamic capability to manage risk. This resulted in near-bottom performance across all metrics.
 ---
 * **Hypothesis 2: Models with memory are better.** We tested an LSTM-based agent (`RecurrentPPO`). **Result:** Performance degraded. The added complexity led to overfitting on the training data.
 * **Hypothesis 3: Using Regularization is better.** We tested both L1 and L2 regularization. **Results:** Performance degraded.
 * **Hypothesis 4: Increasing the window from 30 days is better.** We tested increasing the window to 60 days. **Results:** Performance degraded. increasing the context window is not always good and it could be seen as more noise for the model.
+* **Hypothesis 5: A Transformer-based architecture is superior.** We replaced the standard Multi-Layer Perceptron (MLP) policy network with a more powerful Transformer model, hypothesizing its attention mechanism would better capture complex temporal relationships. **Result**: Performance degraded. Similar to the LSTM experiment, the Transformer model was too complex for the amount of data available. It suffered from significant overfitting, performing well on training data but failing to generalize to unseen market scenarios.
+The conclusion was clear: a simple MLP (Multi-Layer Perceptron) policy network, fed with just normalized price data and a concise 30-day window, was the most effective and robust architecture.
+---
+## 🖥️ Deep RL & LLM Portfolio Manager (Web App)
+A key feature of v1.0 is the interactive web application built with **Gradio**. This dashboard bridges the gap between complex backend models and user-friendly analysis, allowing for live tracking, forward-looking strategy generation, and historical backtesting.
+The dashboard integrates **Large Language Models (LLMs)**, specifically Qwen, to act as an AI Risk Analyst, providing textual justification and risk assessments for the RL agent's proposed strategies.
+### Key Features:
+#### 1. Live Dashboard & Net Worth Tracking
+Track the current portfolio holdings, recent transactions, and the overall net worth evolution in real-time.
+![Live Dashboard](results/tab1.png)
+#### 2. AI-Powered Strategy Forecast & Risk Analysis
+Generate tomorrow's optimal portfolio allocation using the trained RL agents. The integrated LLM analyzes the proposed allocation, current market volatility (VIX), and asset concentration to provide a comprehensive **Risk Analyst Report** with a confidence score and justifications.
+It also includes **Explainable AI (XAI)** feature importance plots to show which market factors most influenced the agent's decision.
+![AI Forecast and Risk Analysis](assets/tab2.png)
+#### 3. Historical Simulation & Backtesting
+Run dynamic backtests of the trained RL agents against baselines over any historical period. This tool is essential for validating performance across different market cycles.
+![Historical Simulation](assets/tab2.png)
 ---
 ## 📂 Project Structure
 ```bash
+├── assets/             # Images for the README
+├── checkpoints/        # Stores trained model weights (.zip files)
+├── data/               # Stores fetched CSV data files
+├── results/            # Stores generated plots and metrics logs
+├── scripts/            # Contains all the executable scripts
+│   ├── app.py             # The Gradio web application
+│   ├── check_env.py        # Simple script to verify the custom environment
+│   ├── custom_policy.py    # Custom policy network definitions
+│   ├── environment.py      # The custom Gymnasium environment class
+│   ├── evaluate_baselines.py # Calculates performance of baseline strategies
+│   ├── evaluate.py         # Main script to evaluate a trained agent
+│   ├── fetch_market_data.py # Script to download historical data from YFinance
+│   ├── llm_analysis_rag.py # Script for LLM-based analysis and RAG
+│   ├── predict_tomorrow.py # Script to generate predictions for the next day
+│   ├── stress_test.py      # Compares all agents on a specific dataset
+│   ├── train.py            # Main script to train an RL agent
+│   ├── tune_sac.py         # Script for hyperparameter tuning of the SAC agent
+│   └── visualize_strategy.py # Plots the asset allocation of a trained agent
+├── requirements.txt    # List of Python dependencies
+└── README.md           # This file
 ```
 ## 🚀 How to Run
 ### Setup
+1. Clone the repository:
+```Bash
+git clone https://github.com/DanielKiani/Portfolio-Optimization-with-Deep-Reinforcement-Learning
+```
+2. Install the required packages:
 ```bash
+pip install -r requirements.txt
+```
+### Data Fetching
+Before training or evaluation, you need to download the historical market data. Use the `fetch_market_data.py` script.
+```Bash
+# Fetch training data (e.g., 2015-2020)
+python scripts/fetch_market_data.py --start 2015-01-01 --end 2020-12-31 --filename data/train_data.csv
+# Fetch evaluation data (e.g., 2021-2023)
+python scripts/fetch_market_data.py --start 2021-01-01 --end 2023-12-31 --filename data/eval_data.csv
 ```
 ### Training
+Use the `train.py` script to train an agent. You can specify the algorithm (ppo, sac, or td3) and the number of training timesteps.
+```Bash
+# Train a TD3 agent (default timesteps: 20000)
+python scripts/train.py --agent td3 --datafile data/train_data.csv
 # Train a SAC agent for more timesteps
+python scripts/train.py --agent sac --datafile data/train_data.csv --timesteps 50000
 ```
+The trained model will be saved in the `checkpoints/` directory (e.g., `sac_portfolio_model.zip`).
 ### Evaluation & Visualization
+Once you have trained models and evaluation data, you can use the other scripts to analyze performance.
+* **Compare all agents** (`stress_test.py`): This script loads all available models in `checkpoints/` and compares them against the baseline on a given dataset.
+```Bash
+python scripts/stress_test.py --datafile data/eval_data.csv
+```
+This will generate `results/agent_performance_comparison`.png and print a metrics table.
+* **Evaluate a single agent** (`evaluate.py`): This script calculates detailed metrics for a specific agent and plots its portfolio value.
+```Bash
+python scripts/evaluate.py --agent td3 --checkpoint checkpoints/td3_portfolio_model.zip --datafile data/eval_data.csv
 ```
+* **Visualize an agent's strategy** (`visualize_strategy.py`): This script creates a stacked area chart showing how the agent's asset allocation changed over time.
+```Bash
+python scripts/visualize_strategy.py --agent ppo --checkpoint checkpoints/ppo_portfolio_model.zip --datafile data/eval_data.csv
+```
+This will save the plot to `results/ppo_portfolio_allocation.png`.

requirements.txt CHANGED Viewed

@@ -1,19 +1,28 @@
-# Core RL and Simulation
-stable-baselines3==2.7.0
-sb3_contrib==2.7.0
-gymnasium==1.2.1
-# Data Handling and Numerics
-pandas==2.3.3
 numpy==2.2.6
-scikit-learn==1.6.1
-# Data Fetching
 yfinance==0.2.66
-# Financial Indicators
-pandas-ta==0.4.71b0
-# Plotting and Visualization
-matplotlib==3.10.0
-seaborn==0.13.2

+# Core Data Science & Mathematics
 numpy==2.2.6
+pandas==2.3.3
+scipy==1.16.3
+# Visualization
+matplotlib==3.10.0
+plotly==5.24.1
+# Financial Data
 yfinance==0.2.66
+pandas-datareader==0.10.0
+# Reinforcement Learning
+gymnasium==1.2.2
+shimmy==2.0.0
+stable-baselines3==2.7.0
+sb3-contrib==2.7.0
+# Deep Learning Framework
+torch==2.9.0
+# Utilities & Other
+gradio==5.50.0
+python-dotenv==1.2.1
+tabulate==0.9.0
+quantstats==0.0.62
+pandas-ta==0.4.71b0

results/final_performance_comparison_all_agents.png CHANGED Viewed

Git LFS Details

SHA256: 75b085d93cd947906c2f6f5fdf4f1fbc0b53cdb56d2ad3a77011b6cae8c787ab
Pointer size: 131 Bytes
Size of remote file: 225 kB

Git LFS Details

SHA256: 0fd8cba927f0f2bed5373fb5fb44bf20e2ed4d22219196e48763fdd1f5f6787d
Pointer size: 131 Bytes
Size of remote file: 298 kB

results/{td3_portfolio_alocation.png → ppo_allocation.png} RENAMED Viewed

File without changes

results/sac_allocation.png ADDED Viewed

Git LFS Details

SHA256: 6975e520974d024567499f1605269abccdc2f56f04dd1ab97b32df85fc78f27e
Pointer size: 131 Bytes
Size of remote file: 335 kB

results/{ppo_portfolio_alocation.png → td3_allocation.png} RENAMED Viewed

File without changes

results/{sac_portfolio_alocation.png → td3_transformer_allocation.png} RENAMED Viewed

File without changes

scripts/app.py ADDED Viewed

	@@ -0,0 +1,662 @@

+# scripts/app.py
+import gradio as gr
+import pandas as pd
+import numpy as np
+import plotly.graph_objects as go
+import plotly.express as px
+from datetime import datetime, timedelta
+import os
+import sys
+import json
+import torch
+from fetch_market_data import fetch_market_data, ASSETS, FRED_IDS
+from llm_analysis_rag import analyze_agent_decision, analyze_historical_segment
+from stable_baselines3 import SAC
+from environment import PortfolioEnv
+from scripts.evaluate_baselines import buy_and_hold, equally_weighted_rebalanced
+# --- Configuration ---
+MODEL_PATH = os.path.join(project_root, "checkpoints", "sac_portfolio_model.zip")
+WINDOW_SIZE = 30
+MACRO_COLS = list(FRED_IDS.values())
+DASHBOARD_DATA_PATH = os.path.join(project_root, "data", "historical_dashboard_data.csv")
+# *** UPDATE THESE DATES TO MATCH YOUR ACTUAL TRAINING PERIOD ***
+TRAIN_START_DATE = "2015-01-01"
+TRAIN_END_DATE = "2023-01-01"
+# Global variable for dashboard data needed for Tabs 3 & 4
+DASHBOARD_DATA_DF = None
+# Define Time Period mappings for the dropdown
+TIME_PERIODS = {
+    "6 Months": 180,
+    "1 Year": 365,
+    "2 Years": 730,
+    "5 Years": 1825,
+    "Max Available": 9999 # Sentinel value for max
+}
+# =========================================
+# Initialization Functions
+# =========================================
+def initialize_dashboard_data():
+    """Fetches and loads historical data at startup for Tabs 3 & 4."""
+    global DASHBOARD_DATA_DF
+    print("--- Initializing Historical Data for Analyst/Simulation Tabs ---")
+    # Fetching last 6 years to support longer analysis periods and simulation
+    end_date = (datetime.now() - timedelta(days=1)).strftime('%Y-%m-%d')
+    start_date = (datetime.now() - timedelta(days=365*6)).strftime('%Y-%m-%d')
+    print(f"Fetching historical data from {start_date} to {end_date}...")
+    # This might take a minute on first run
+    fetch_market_data(start_date, end_date, DASHBOARD_DATA_PATH)
+    if os.path.exists(DASHBOARD_DATA_PATH):
+        DASHBOARD_DATA_DF = pd.read_csv(DASHBOARD_DATA_PATH, index_col=0, parse_dates=True)
+        # Basic cleaning
+        DASHBOARD_DATA_DF.dropna(how='all', inplace=True)
+        # Calculate equal weight return for dashboard metrics
+        asset_cols = [c for c in ASSETS if c in DASHBOARD_DATA_DF.columns]
+        if asset_cols:
+             DASHBOARD_DATA_DF['Daily_Ret_Eq'] = DASHBOARD_DATA_DF[asset_cols].pct_change().mean(axis=1)
+        print(f"Data loaded successfully. Shape: {DASHBOARD_DATA_DF.shape}")
+        print(f"Data range: {DASHBOARD_DATA_DF.index.min().date()} to {DASHBOARD_DATA_DF.index.max().date()}")
+    else:
+        print("❌ Failed to initialize historical data.")
+# Initialize data at startup
+try:
+    initialize_dashboard_data()
+except Exception as e:
+    print(f"Warning: Data initialization failed. Error: {e}")
+# =========================================
+# Professional Metrics & Evaluation Functions
+# =========================================
+def evaluate_agent_pro(env, model):
+    """
+    Runs the trained agent on the environment and returns portfolio values.
+    """
+    obs, info = env.reset()
+    terminated, truncated = False, False
+    portfolio_values = [env.initial_amount]
+    while not (terminated or truncated):
+        action, _states = model.predict(obs, deterministic=True)
+        obs, reward, terminated, truncated, info = env.step(action)
+        portfolio_values.append(info['portfolio_value'])
+    # Align index with the actual steps taken
+    valid_dates = env.df.index[env.window_size-1:]
+    return pd.Series(portfolio_values, index=valid_dates[:len(portfolio_values)])
+def calculate_metrics_pro(portfolio_values, freq=252, rf=0.0):
+    """
+    Calculates key professional performance metrics from a series of portfolio values.
+    """
+    if len(portfolio_values) < 2:
+        return {k: "N/A" for k in ["Total Return", "CAGR", "Sharpe Ratio", "Sortino Ratio", "Volatility", "Max Drawdown", "Calmar Ratio"]}
+    returns = portfolio_values.pct_change().dropna()
+    if returns.empty:
+         return {k: "0.00%" if "%" in k else "0.00" for k in ["Total Return", "CAGR", "Sharpe Ratio", "Sortino Ratio", "Volatility", "Max Drawdown", "Calmar Ratio"]}
+    total_return = (portfolio_values.iloc[-1] / portfolio_values.iloc[0]) - 1
+    num_years = (len(portfolio_values) - 1) / freq
+    cagr = (portfolio_values.iloc[-1] / portfolio_values.iloc[0]) ** (1/num_years) - 1 if num_years > 0 else 0.0
+    sharpe_ratio = np.sqrt(freq) * (returns.mean() - rf) / returns.std() if returns.std() > 0 else np.nan
+    downside_returns = returns[returns < 0]
+    downside_std = downside_returns.std()
+    sortino_ratio = np.sqrt(freq) * (returns.mean() - rf) / downside_std if downside_std > 0 else np.nan
+    volatility = returns.std() * np.sqrt(freq)
+    rolling_max = portfolio_values.cummax()
+    drawdown = portfolio_values / rolling_max - 1.0
+    max_drawdown = drawdown.min()
+    calmar_ratio = cagr / abs(max_drawdown) if max_drawdown != 0 and cagr != 0 else np.nan
+    return {
+        "Total Return": total_return,
+        "CAGR": cagr,
+        "Sharpe Ratio": sharpe_ratio,
+        "Sortino Ratio": sortino_ratio,
+        "Volatility": volatility,
+        "Max Drawdown": max_drawdown,
+        "Calmar Ratio": calmar_ratio
+    }
+# =========================================
+# XAI: Feature Importance Function
+# =========================================
+def calculate_feature_importance(model, obs):
+    """
+    Calculates feature importance using Integrated Gradients on the RL agent's policy network.
+    """
+    # Convert observation to torch tensor and enable gradient tracking
+    obs_tensor = torch.as_tensor(obs, dtype=torch.float32, device=model.device)
+    obs_tensor.requires_grad_()
+    # Get the policy network (actor)
+    actor = model.policy.actor
+    # Define a baseline (e.g., a zero observation)
+    baseline = torch.zeros_like(obs_tensor)
+    # Number of steps for integral approximation
+    steps = 50
+    # Generate scaled inputs along the path from baseline to input
+    scaled_inputs = [baseline + (float(i) / steps) * (obs_tensor - baseline) for i in range(steps + 1)]
+    grads = []
+    for scaled_input in scaled_inputs:
+        # Forward pass to get action distribution parameters (mean)
+        action_mean = actor(scaled_input)
+        # We need a scalar output to calculate gradients against.
+        # Here we sum, representing overall sensitivity of the action vector.
+        target_output = action_mean.sum()
+        # Calculate gradients of the target output with respect to the input features
+        grad = torch.autograd.grad(outputs=target_output, inputs=scaled_input)[0]
+        grads.append(grad)
+    # Average the gradients using the trapezoidal rule approximation
+    avg_grads = (grads[:-1] + grads[1:]) / 2.0
+    avg_grads = torch.stack(avg_grads).mean(dim=0)
+    # Calculate Integrated Gradients: (input - baseline) * average_gradients
+    integrated_grads = (obs_tensor - baseline) * avg_grads
+    # Detach, move to cpu, and convert to numpy array
+    importance_scores = integrated_grads.detach().cpu().numpy().flatten()
+    # Feature Names mapping
+    num_assets = len(ASSETS)
+    num_macro = len(MACRO_COLS)
+    # Create feature names based on the observation structure
+    feature_names = []
+    for i in range(WINDOW_SIZE):
+        for asset in ASSETS:
+            feature_names.append(f"{asset}_t-{WINDOW_SIZE-1-i}")
+    for i in range(WINDOW_SIZE):
+        for macro in MACRO_COLS:
+            feature_names.append(f"{macro}_t-{WINDOW_SIZE-1-i}")
+    # Combine into a dictionary and sort by absolute importance
+    feature_importance_dict = dict(zip(feature_names, importance_scores))
+    # Aggregate importance by feature type (sum of absolute values across time steps)
+    aggregated_importance = {}
+    for base_feature in ASSETS + MACRO_COLS:
+        total_imp = sum(abs(val) for key, val in feature_importance_dict.items() if key.startswith(base_feature))
+        aggregated_importance[base_feature] = total_imp
+    # Sort and take top N for display
+    top_features = dict(sorted(aggregated_importance.items(), key=lambda item: item[1], reverse=True)[:8])
+    # Create a Plotly bar chart
+    fig = px.bar(
+        x=list(top_features.values()),
+        y=list(top_features.keys()),
+        orientation='h',
+        title="Top Influential Features (XAI)",
+        labels={'x': 'Relative Importance Score', 'y': 'Feature'},
+        color=list(top_features.values()),
+        color_continuous_scale=px.colors.sequential.Viridis
+    )
+    fig.update_layout(
+        template="plotly_dark",
+        paper_bgcolor='rgba(0,0,0,0)',
+        plot_bgcolor='rgba(0,0,0,0)',
+        yaxis={'categoryorder':'total ascending'},
+        coloraxis_showscale=False,
+        margin=dict(l=10, r=10, t=40, b=10),
+        height=300 # Keep it compact
+    )
+    return fig
+# =========================================
+# Tab 4 Logic: Historical Simulation (UPDATED)
+# =========================================
+def run_historical_simulation(start_date_str, end_date_str):
+    """
+    Runs the RL agent on historical data and compares to baselines using professional metrics.
+    """
+    if DASHBOARD_DATA_DF is None:
+        return go.Figure(), "Data not initialized. Please restart app.", gr.update(visible=False)
+    status_msg = "Preparing simulation..."
+    yield go.Figure(), status_msg, gr.update(visible=False)
+    try:
+        # 1. Validate and Slice Data
+        try:
+            start_date = pd.to_datetime(start_date_str)
+            end_date = pd.to_datetime(end_date_str)
+        except ValueError:
+             yield go.Figure(), "Error: Invalid date format. Use YYYY-MM-DD.", gr.update(visible=False)
+             return
+        if start_date < DASHBOARD_DATA_DF.index.min() or end_date > DASHBOARD_DATA_DF.index.max():
+             avail_start = DASHBOARD_DATA_DF.index.min().date()
+             avail_end = DASHBOARD_DATA_DF.index.max().date()
+             yield go.Figure(), f"Error: Selected dates outside available range ({avail_start} to {avail_end}).", gr.update(visible=False)
+             return
+        df_slice = DASHBOARD_DATA_DF.loc[start_date:end_date].copy()
+        asset_cols_only = [c for c in ASSETS if c in df_slice.columns]
+        if len(df_slice) < WINDOW_SIZE + 10:
+             yield go.Figure(), "Error: Time period too short for simulation.", gr.update(visible=False)
+             return
+        # 2. Setup Environment and Agent
+        status_msg = "Running RL Agent simulation..."
+        yield go.Figure(), status_msg, gr.update(visible=False)
+        env = PortfolioEnv(df_slice, WINDOW_SIZE, initial_amount=10000)
+        if not os.path.exists(MODEL_PATH):
+             raise FileNotFoundError(f"Model not found: {MODEL_PATH}")
+        model = SAC.load(MODEL_PATH)
+        # 3. Run Simulation Loop & Get Values using Pro Function
+        rl_portfolio_series = evaluate_agent_pro(env, model)
+        # 4. Calculate Baselines using Pro Functions
+        status_msg = "Calculating baselines and metrics..."
+        yield go.Figure(), status_msg, gr.update(visible=False)
+        # Pass only asset columns to baseline functions
+        bnh_portfolio_series = buy_and_hold(df_slice[asset_cols_only], initial_amount=10000)
+        # Realign B&H index to match RL agent's start date
+        bnh_portfolio_series = bnh_portfolio_series.loc[rl_portfolio_series.index[0]:]
+        # Normalize B&H starting value to match RL agent's start
+        bnh_portfolio_series = bnh_portfolio_series / bnh_portfolio_series.iloc[0] * 10000
+        eq_portfolio_series = equally_weighted_rebalanced(df_slice[asset_cols_only], initial_amount=10000)
+        eq_portfolio_series = eq_portfolio_series.loc[rl_portfolio_series.index[0]:]
+        eq_portfolio_series = eq_portfolio_series / eq_portfolio_series.iloc[0] * 10000
+        # 5. Generate Plot
+        fig = go.Figure()
+        fig.add_trace(go.Scatter(x=rl_portfolio_series.index, y=rl_portfolio_series, mode='lines', name='RL Agent (SAC)', line=dict(color='#10b981', width=3)))
+        fig.add_trace(go.Scatter(x=bnh_portfolio_series.index, y=bnh_portfolio_series, mode='lines', name='Buy & Hold (SPY)', line=dict(color='#6b7280', dash='dash')))
+        fig.add_trace(go.Scatter(x=eq_portfolio_series.index, y=eq_portfolio_series, mode='lines', name='Equal Weighted', line=dict(color='#a855f7', dash='dot')))
+        fig.update_layout(
+            title="Simulation: Strategy Performance Comparison ($10k Start)",
+            xaxis_title="Date",
+            yaxis_title="Portfolio Value ($)",
+            template="plotly_dark",
+            paper_bgcolor='rgba(0,0,0,0)',
+            plot_bgcolor='rgba(0,0,0,0)',
+            hovermode="x unified",
+            legend=dict(orientation="h", yanchor="bottom", y=1.02, xanchor="right", x=1)
+        )
+        # 6. Calculate Professional Metrics Table
+        rl_m = calculate_metrics_pro(rl_portfolio_series)
+        bnh_m = calculate_metrics_pro(bnh_portfolio_series)
+        eq_m = calculate_metrics_pro(eq_portfolio_series)
+        # Helper to format based on metric type
+        def fmt(val, is_pct=True):
+            if pd.isna(val): return "N/A"
+            return f"{val:.2%}" if is_pct else f"{val:.2f}"
+        metrics_data = {
+            "Metric": ["Total Return", "CAGR", "Sharpe Ratio", "Sortino Ratio", "Volatility (Ann.)", "Max Drawdown", "Calmar Ratio"],
+            "RL Agent (SAC)": [fmt(rl_m["Total Return"]), fmt(rl_m["CAGR"]), fmt(rl_m["Sharpe Ratio"], False), fmt(rl_m["Sortino Ratio"], False), fmt(rl_m["Volatility"]), fmt(rl_m["Max Drawdown"]), fmt(rl_m["Calmar Ratio"], False)],
+            "Buy & Hold (SPY)": [fmt(bnh_m["Total Return"]), fmt(bnh_m["CAGR"]), fmt(bnh_m["Sharpe Ratio"], False), fmt(bnh_m["Sortino Ratio"], False), fmt(bnh_m["Volatility"]), fmt(bnh_m["Max Drawdown"]), fmt(bnh_m["Calmar Ratio"], False)],
+            "Equal Weighted": [fmt(eq_m["Total Return"]), fmt(eq_m["CAGR"]), fmt(eq_m["Sharpe Ratio"], False), fmt(eq_m["Sortino Ratio"], False), fmt(eq_m["Volatility"]), fmt(eq_m["Max Drawdown"]), fmt(eq_m["Calmar Ratio"], False)],
+        }
+        metrics_df = pd.DataFrame(metrics_data)
+        # Format the dataframe as a markdown table for cleaner display
+        metrics_md = metrics_df.to_markdown(index=False)
+        final_metrics_display = f"### 📊 Professional Performance Metrics\n\n{metrics_md}"
+        yield fig, "Simulation Complete.", final_metrics_display
+    except Exception as e:
+        import traceback
+        traceback.print_exc()
+        yield go.Figure(), f"Error during simulation: {str(e)}", gr.update(visible=False)
+# =========================================
+# Tab 3 Logic: Historical Data Analyst
+# =========================================
+def run_historical_analysis(selected_assets, period_name):
+    """Backend for Tab 3."""
+    if DASHBOARD_DATA_DF is None or not selected_assets:
+        return go.Figure(), "Please wait for data initialization or select assets."
+    status_html = """<div style="color: #9ca3af;">🔄 Processing data and running AI analysis...</div>"""
+    yield go.Figure(), status_html
+    try:
+        # 1. Filter Data by Time Period
+        days = TIME_PERIODS.get(period_name, 365)
+        cutoff_date = datetime.now() - timedelta(days=days)
+        valid_assets = [a for a in selected_assets if a in DASHBOARD_DATA_DF.columns]
+        if not valid_assets:
+             yield go.Figure(), "Error: Selected assets not found in available data."
+             return
+        df_filtered = DASHBOARD_DATA_DF.loc[cutoff_date:, valid_assets].copy()
+        if df_filtered.empty:
+             yield go.Figure(), f"No data found for the selected period: {period_name}"
+             return
+        # 2. Generate Normalized Price Plot
+        df_normalized = df_filtered / df_filtered.iloc[0] * 100
+        fig = px.line(df_normalized, x=df_normalized.index, y=df_normalized.columns,
+                      title=f"Performance Comparison: {period_name} (Base=100)",
+                      color_discrete_sequence=px.colors.qualitative.Bold)
+        fig.update_layout(template="plotly_dark", paper_bgcolor='rgba(0,0,0,0)', plot_bgcolor='rgba(0,0,0,0)',
+            yaxis_title="Normalized Price", xaxis_title="Date", legend_title_text="", hovermode="x unified")
+        # 3. Run AI Analysis
+        analysis_text = analyze_historical_segment(df_filtered, valid_assets, period_name)
+        formatted_analysis = f"### 🤖 AI Analyst Report: {period_name}\n\n{analysis_text}"
+        yield fig, formatted_analysis
+    except Exception as e:
+        import traceback
+        traceback.print_exc()
+        yield go.Figure(), f"### Error during analysis\n\n{str(e)}"
+# =========================================
+# Tab 2 Logic: Forecast & Analysis (XAI)
+# =========================================
+def get_latest_data_window(window_size=30):
+    """Fetches latest data needed for prediction."""
+    print("Fetching prediction data...")
+    lookback_days = window_size + 150
+    end_date = datetime.now().strftime('%Y-%m-%d')
+    start_date = (datetime.now() - timedelta(days=lookback_days)).strftime('%Y-%m-%d')
+    temp_filename = os.path.join(project_root, "data", "temp_gradio_prediction_data.csv")
+    fetch_market_data(start_date, end_date, temp_filename)
+    if not os.path.exists(temp_filename): raise Exception("Failed to fetch market data file.")
+    df = pd.read_csv(temp_filename, index_col=0, parse_dates=True)
+    df.dropna(inplace=True)
+    if len(df) < window_size: raise Exception(f"Not enough clean data fetched for prediction.")
+    return df.iloc[-window_size:].copy()
+def prepare_observation(data_window):
+    price_data = data_window[ASSETS].values
+    macro_data = data_window[MACRO_COLS].values
+    norm_prices = price_data / (price_data[0] + 1e-8)
+    norm_macro = macro_data / (macro_data[0] + 1e-8)
+    obs = np.concatenate([norm_prices, norm_macro], axis=1)
+    # Return both flattened obs for model and raw obs for XAI
+    return obs.flatten().astype(np.float32), obs.astype(np.float32), data_window
+def predict_and_analyze():
+    """Main function for Forecast Tab."""
+    status_msg = "Starting process..."
+    loading_html = """<div style="color: #9ca3af;">🔄 Fetching data & running prediction...</div>"""
+    # Update to yield an empty plot for the XAI chart initially
+    yield status_msg, None, go.Figure(), loading_html
+    try:
+        data_window = get_latest_data_window(WINDOW_SIZE)
+        # Get flattened obs for prediction and raw obs for XAI
+        flat_obs, raw_obs, df_window_for_analyst = prepare_observation(data_window)
+        if not os.path.exists(MODEL_PATH): raise FileNotFoundError(f"Model not found: {MODEL_PATH}")
+        model = SAC.load(MODEL_PATH)
+        # --- XAI: Calculate Feature Importance ---
+        status_msg = "Calculating feature importance..."
+        yield status_msg, None, go.Figure(), loading_html
+        xai_plot = calculate_feature_importance(model, raw_obs)
+        # --- Prediction ---
+        action, _ = model.predict(flat_obs, deterministic=True)
+        exp_action = np.exp(np.asarray(action).flatten())
+        weights = exp_action / np.sum(exp_action)
+        allocations_dict = {asset: weights[i] for i, asset in enumerate(ASSETS)}
+        allocations_dict['Cash'] = weights[-1]
+        alloc_df = pd.DataFrame(list(allocations_dict.items()), columns=['Asset', 'Proposed Allocation'])
+        alloc_df['Proposed Allocation'] = alloc_df['Proposed Allocation'].apply(lambda x: f"{x:.2%}")
+        status_msg = "Prediction done. Running AI Risk Analysis..."
+        analysing_html = """<div style="color: #9ca3af;">🤖 Running Qwen-2.5-3B Risk Analysis...</div>"""
+        # Yield XAI plot along with other outputs
+        yield status_msg, alloc_df, xai_plot, analysing_html
+        allocations_for_llm = {k: float(v) for k, v in allocations_dict.items()}
+        analysis_result = analyze_agent_decision(df_window_for_analyst, allocations_for_llm)
+        status_msg = "Analysis complete!"
+        if isinstance(analysis_result, dict):
+            strat = analysis_result.get('strategy_summary', 'N/A')
+            risk = analysis_result.get('risk_level', 'N/A').upper()
+            just = analysis_result.get('justification', 'N/A')
+            conf = analysis_result.get('confidence_score', 'N/A')
+            if 'HIGH' in risk:
+                risk_css = "color: #ef4444; font-weight: bold;"
+                status_bg = "#7f1d1d"
+                status_border = "#ef4444"
+                status_icon = "⛔"
+                status_text = "TRADE BLOCKED: High Risk Detected"
+            else:
+                risk_css = "color: #10b981; font-weight: bold;"
+                status_bg = "#064e3b"
+                status_border = "#10b981"
+                status_icon = "🚀"
+                status_text = "TRADE APPROVED"
+            report_html = f"""
+            <div style="background-color: #1f2937; padding: 20px; border-radius: 12px 12px 0 0; border: 1px solid #374151; border-bottom: none;">
+                <h3 style="margin-top: 0; color: #e5e7eb;">🤖 AI Risk Analyst Report</h3>
+                <div style="margin-bottom: 15px;"><strong style="color: #9ca3af;">Strategy:</strong><br><span style="color: #d1d5db;">{strat}</span></div>
+                <div style="margin-bottom: 15px;"><strong style="color: #9ca3af;">Risk Level:</strong><span style="margin-left: 8px; {risk_css}">{risk}</span></div>
+                <div style="margin-bottom: 15px;"><strong style="color: #9ca3af;">Justification:</strong><br><span style="color: #d1d5db;">{just}</span></div>
+                <div><strong style="color: #9ca3af;">Confidence:</strong> <span style="color: #d1d5db;">{conf}/10</span></div>
+            </div>
+            <div style="background-color: {status_bg}; color: white; padding: 15px; border-radius: 0 0 12px 12px; border: 2px solid {status_border}; text-align: center; font-size: 1.2em; font-weight: bold; display: flex; align-items: center; justify-content: center;">
+                <span style="margin-right: 10px; font-size: 1.4em;">{status_icon}</span>{status_text}
+            </div>"""
+        else:
+            report_html = f"""<div style="padding: 20px; background-color: #7f1d1d; color: #fca5a5; border-radius: 12px;"><h3>❌ Analysis Failed to Parse</h3><p>{str(analysis_result)}</p></div>"""
+        # Final yield with all outputs including XAI plot
+        yield status_msg, alloc_df, xai_plot, report_html
+    except Exception as e:
+        import traceback
+        traceback.print_exc()
+        status_msg = f"Error: {str(e)}"
+        error_html = f"""<div style="padding: 20px; background-color: #7f1d1d; color: #fca5a5; border-radius: 12px;"><h3>❌ Process Error</h3><p>{str(e)}</p></div>"""
+        # Final yield in case of error
+        yield status_msg, None, go.Figure(), error_html
+# =========================================
+# Tab 1 Logic: Live Dashboard (DUMMY DATA)
+# =========================================
+def get_dashboard_metrics():
+    return "$135,400", "+3.07%"
+def get_portfolio_history_plot():
+    dates = pd.date_range(start="2023-01-01", periods=100)
+    np.random.seed(42)
+    rl_returns = np.random.normal(0.001, 0.01, 100)
+    bnh_returns = np.random.normal(0.0005, 0.012, 100)
+    rl_value = 10000 * np.cumprod(1 + rl_returns)
+    bnh_value = 10000 * np.cumprod(1 + bnh_returns)
+    fig = go.Figure()
+    fig.add_trace(go.Scatter(x=dates, y=rl_value, mode='lines', name='RL Agent (Live)', line=dict(color='#10b981', width=3)))
+    fig.add_trace(go.Scatter(x=dates, y=bnh_value, mode='lines', name='Benchmark', line=dict(color='#6b7280', dash='dash')))
+    fig.update_layout(title="Portfolio Net Worth (Live Tracking)", xaxis_title="Date", yaxis_title="Net Worth ($)", template="plotly_dark", paper_bgcolor='rgba(0,0,0,0)', plot_bgcolor='rgba(0,0,0,0)', legend=dict(orientation="h", yanchor="bottom", y=1.02, xanchor="right", x=1))
+    return fig
+def get_current_allocation_plot():
+    labels = ASSETS + ['Cash']
+    values = [0.25, 0.10, 0.30, 0.15, 0.05, 0.15]
+    fig = px.pie(values=values, names=labels, title="Current Holdings Breakdown", color_discrete_sequence=px.colors.qualitative.Bold)
+    fig.update_traces(textposition='inside', textinfo='percent+label', hole=.4)
+    fig.update_layout(template="plotly_dark", paper_bgcolor='rgba(0,0,0,0)', legend=dict(orientation="h", yanchor="bottom", y=-0.1))
+    return fig
+def get_recent_transactions():
+    data = [["2025-11-24", "Rebalance", "MULTIPLE", "N/A"], ["2025-11-24", "SELL", "SPY", "$4,500"], ["2025-11-24", "BUY", "TLT", "$4,200"], ["2025-11-21", "BUY", "BTC-USD", "$1,000"]]
+    return pd.DataFrame(data, columns=["Date", "Type", "Asset", "Approx. Value"])
+# =========================================
+# Gradio Interface
+# =========================================
+custom_css = """
+.metric-box { background-color: #1f2937; padding: 20px; border-radius: 12px; border: 1px solid #374151; text-align: center; }
+.metric-label { font-size: 1.1em; color: #9ca3af; margin-bottom: 5px; }
+.metric-value { font-size: 2.2em; font-weight: 700; color: #e5e7eb; }
+.disclaimer-box { background-color: #374151; padding: 15px; border-radius: 8px; border-left: 4px solid #f59e0b; color: #d1d5db; font-size: 0.9em; margin-bottom: 20px; }
+"""
+theme = gr.themes.Soft(primary_hue="emerald", secondary_hue="slate", neutral_hue="zinc").set(
+    body_background_fill="#111827", block_background_fill="#1f2937", block_border_width="1px", block_border_color="#374151"
+)
+with gr.Blocks(theme=theme, css=custom_css, title="Deep RL Portfolio Manager") as demo:
+    gr.HTML("""<script>function forceDark(){document.body.classList.add('dark');} forceDark(); setTimeout(forceDark, 500);</script>""")
+    gr.Markdown("# 🧠 Deep RL & LLM Portfolio Manager")
+    with gr.Tabs():
+        # ================= TAB 1: DASHBOARD (RESTORED) =================
+        with gr.TabItem("📊 Live Dashboard"):
+            # Metrics Row
+            with gr.Row():
+                # MOVED THIS LINE INSIDE THE TAB
+                nw_val, dc_val = get_dashboard_metrics()
+                with gr.Column(elem_classes=["metric-box"]):
+                    gr.HTML(f"<div class='metric-label'>Current Net Worth</div><div class='metric-value'>{nw_val}</div>")
+                with gr.Column(elem_classes=["metric-box"]):
+                    gr.HTML(f"<div class='metric-label'>24h Change</div><div class='metric-value' style='color: #10b981;'>{daily_change}</div>")
+            # Main Chart row
+            with gr.Row():
+                with gr.Column(scale=3):
+                    history_chart = gr.Plot(value=get_portfolio_history_plot(), label="Net Worth History")
+            # Bottom Row: Allocations and Transactions
+            with gr.Row():
+                with gr.Column(scale=1):
+                    allocation_chart = gr.Plot(value=get_current_allocation_plot(), label="Current Allocation")
+                with gr.Column(scale=2):
+                    gr.Markdown("### Recent Transactions")
+                    transactions_table = gr.Dataframe(value=get_recent_transactions(), interactive=False, wrap=True)
+        # ================= TAB 2: FORECAST (UPDATED with XAI) =================
+        with gr.TabItem("🔮 Forecast & AI Analysis"):
+            gr.Markdown("### Generate Tomorrow's Portfolio Strategy")
+            run_btn = gr.Button("🚀 Run Overnight Analysis", variant="primary", size="lg")
+            status_output = gr.Textbox(label="System Status", placeholder="Ready...", interactive=False, lines=1)
+            gr.Markdown("---")
+            with gr.Row():
+                # Left Column: Allocations & XAI Plot
+                with gr.Column(scale=2):
+                    gr.Markdown("### 📈 Suggested Position")
+                    allocation_output = gr.Dataframe(headers=["Asset", "Allocation"], datatype=["str", "str"], interactive=False)
+                    # NEW: XAI Feature Importance Plot
+                    gr.Markdown("### 🧠 Why did the agent choose this?")
+                    xai_output_plot = gr.Plot(label="Top Influential Factors (XAI)", show_label=False)
+                # Right Column: AI Analysis Report
+                with gr.Column(scale=3):
+                    analysis_report_html = gr.HTML(label="AI Risk Analysis Report")
+            # Updated click event with new XAI output
+            run_btn.click(
+                fn=predict_and_analyze,
+                inputs=None,
+                outputs=[status_output, allocation_output, xai_output_plot, analysis_report_html]
+            )
+        # ================= TAB 3: HISTORICAL DATA ANALYST =================
+        with gr.TabItem("📅 Historical Data Analyst"):
+            gr.Markdown("### Analyze Past Market Performance with AI")
+            with gr.Row():
+                with gr.Column(scale=1):
+                    all_tickers_hist = ASSETS + list(FRED_IDS.values())
+                    if DASHBOARD_DATA_DF is not None:
+                        available_tickers_hist = [t for t in all_tickers_hist if t in DASHBOARD_DATA_DF.columns]
+                    else:
+                        available_tickers_hist = []
+                    default_tickers_hist = available_tickers_hist[:3] if available_tickers_hist else []
+                    asset_selector = gr.Dropdown(choices=available_tickers_hist, value=default_tickers_hist, multiselect=True, label="1. Select Assets")
+                    period_selector = gr.Dropdown(choices=list(TIME_PERIODS.keys()), value="1 Year", label="2. Select Period")
+                    analyze_btn = gr.Button("🔎 Run Analysis", variant="primary")
+                with gr.Column(scale=3):
+                    historical_plot = gr.Plot(label="Performance Plot")
+            gr.Markdown("---")
+            historical_analysis_md = gr.Markdown("### 🤖 AI Analyst Report\n\n*Click 'Run Analysis' to generate.*")
+            analyze_btn.click(
+                fn=run_historical_analysis,
+                inputs=[asset_selector, period_selector],
+                outputs=[historical_plot, historical_analysis_md]
+            )
+        # ================= TAB 4: HISTORICAL SIMULATION (UPDATED with Pro Metrics) =================
+        with gr.TabItem("🔙 Historical Simulation"):
+            gr.Markdown("### Backtest the RL Agent against Baselines")
+            # Disclaimer Box
+            gr.HTML(f"""
+            <div class='disclaimer-box'>
+                <strong>⚠️ IMPORTANT DISCLAIMER:</strong> The RL model was trained on data from approximately
+                <strong>{TRAIN_START_DATE} to {TRAIN_END_DATE}</strong>. Running simulations outside or overlapping significantly
+                with this period may not accurately reflect real-world performance (lookahead bias or out-of-distribution data).
+                Use for educational purposes only.
+            </div>
+            """)
+            with gr.Row():
+                with gr.Column(scale=1):
+                    start_date_input = gr.Textbox(label="Start Date (YYYY-MM-DD)", value=(datetime.now() - timedelta(days=365)).strftime('%Y-%m-%d'))
+                    end_date_input = gr.Textbox(label="End Date (YYYY-MM-DD)", value=(datetime.now() - timedelta(days=1)).strftime('%Y-%m-%d'))
+                    sim_btn = gr.Button("▶️ Run Simulation", variant="primary")
+                    sim_status = gr.Textbox(label="Status", interactive=False, lines=1)
+                with gr.Column(scale=3):
+                    sim_plot = gr.Plot(label="Simulation Performance")
+            gr.Markdown("---")
+            # Updated to Markdown component for better table formatting
+            sim_metrics_md = gr.Markdown("### 📊 Professional Performance Metrics\n\n*Run simulation to see metrics.*")
+            sim_btn.click(
+                fn=run_historical_simulation,
+                inputs=[start_date_input, end_date_input],
+                outputs=[sim_plot, sim_status, sim_metrics_md]
+            )
+if __name__ == "__main__":
+    demo.queue().launch(server_name="0.0.0.0", server_port=7860, debug=True, share=True)

scripts/custom_policy.py ADDED Viewed

	@@ -0,0 +1,80 @@

+# custom_policy.py
+import torch
+import torch.nn as nn
+from gymnasium import spaces
+from stable_baselines3.common.torch_layers import BaseFeaturesExtractor
+class TransformerFeatureExtractor(BaseFeaturesExtractor):
+    """
+    A custom feature extractor that uses a Transformer Encoder.
+    It takes a flattened observation (window_size * n_features_per_step) and processes
+    it as a sequence.
+    """
+    def __init__(
+        self,
+        observation_space: spaces.Box,
+        features_dim: int = 256,  # The final output dimension
+        n_features_per_step: int = 8,  # <--- CRITICAL CHANGE: Matches 5 assets + 3 macro
+        window_size: int = 30,
+        d_model: int = 64,       # Transformer's internal embedding dimension
+        n_head: int = 4,         # Number of attention heads
+        n_layers: int = 2,       # Number of transformer encoder layers
+        dropout: float = 0.1
+    ):
+        super().__init__(observation_space, features_dim)
+        self.window_size = window_size
+        self.n_features_per_step = n_features_per_step
+        # Input shape check
+        expected_flat_dim = window_size * n_features_per_step
+        if observation_space.shape[0] != expected_flat_dim:
+            raise ValueError(
+                f"Observation space flat dimension {observation_space.shape[0]} "
+                f"does not match expected {expected_flat_dim} "
+                f"(window_size={window_size}, n_features_per_step={n_features_per_step})."
+            )
+        # 1. Input Projection:
+        self.input_projection = nn.Linear(n_features_per_step, d_model)
+        # 2. Positional Encoding:
+        self.positional_encoding = nn.Parameter(torch.randn(1, window_size, d_model))
+        # 3. Transformer Encoder:
+        encoder_layer = nn.TransformerEncoderLayer(
+            d_model=d_model,
+            nhead=n_head,
+            dropout=dropout,
+            batch_first=True
+        )
+        self.transformer_encoder = nn.TransformerEncoder(encoder_layer, num_layers=n_layers)
+        # 4. Output Layers:
+        self.flatten = nn.Flatten()
+        self.linear_out = nn.Linear(window_size * d_model, features_dim)
+        self.relu = nn.ReLU()
+    def forward(self, observations: torch.Tensor) -> torch.Tensor:
+        # Input shape: (batch_size, window_size * n_features_per_step)
+        # 1. Reshape to (batch_size, window_size, n_features_per_step)
+        x = observations.reshape(-1, self.window_size, self.n_features_per_step)
+        # 2. Project input features to d_model
+        x = self.input_projection(x)
+        # 3. Add positional encoding
+        x = x + self.positional_encoding
+        # 4. Pass through Transformer
+        x = self.transformer_encoder(x)
+        # 5. Flatten and project to final output
+        x = self.flatten(x)
+        x = self.relu(self.linear_out(x))
+        return x

scripts/environment.py CHANGED Viewed

@@ -1,3 +1,5 @@
 import gymnasium as gym
 import numpy as np
 import pandas as pd
@@ -5,126 +7,89 @@ from gymnasium import spaces
 class PortfolioEnv(gym.Env):
     """
-    A custom reinforcement learning environment for portfolio management.
-    This environment simulates the daily trading of multiple financial assets. The agent's
-    goal is to learn a policy for allocating capital to maximize risk-adjusted returns.
     """
     metadata = {'render_modes': ['human']}
     def __init__(self, df, window_size=30, initial_balance=10000, transaction_cost_pct=0.001):
-        """
-        Initializes the portfolio management environment.
-        Args:
-            df (pd.DataFrame): A DataFrame containing the daily closing prices of the assets.
-                               The index should be dates and columns should be asset tickers.
-            window_size (int): The number of past days of price data to include in the observation.
-            initial_balance (float): The starting capital for the portfolio.
-            transaction_cost_pct (float): The percentage cost for each trade (e.g., 0.001 for 0.1%).
-        """
         super(PortfolioEnv, self).__init__()
-        # --- Basic Environment Parameters ---
         self.df = df
         self.window_size = window_size
         self.initial_balance = initial_balance
         self.transaction_cost_pct = transaction_cost_pct
-        self.n_assets = len(df.columns)
         # --- Action Space ---
-        # The agent outputs a vector of continuous values, one for each asset plus one for cash.
-        # These raw outputs are then converted to portfolio weights via a softmax function.
-        # The space is defined from -1 to 1 for better compatibility with standard RL algorithms.
-        # Shape: (number of assets + 1 for cash)
         self.action_space = spaces.Box(
             low=-1, high=1, shape=(self.n_assets + 1,), dtype=np.float32
         )
         # --- Observation Space ---
-        # The agent observes a window of past price data, flattened into a 1D vector.
-        # Shape: (window_size * number of assets)
         self.observation_space = spaces.Box(
             low=-np.inf, high=np.inf,
-            shape=(self.window_size * self.n_assets,),
             dtype=np.float32
         )
-        # --- Internal State Variables ---
-        # These variables track the state of the simulation over time.
         self._current_step = 0
-        self._portfolio_value = 0.0
-        # Weights for each asset + cash, e.g., [w_aapl, w_msft, ..., w_cash]
         self._weights = np.zeros(self.n_assets + 1)
     def reset(self, seed=None):
-        """
-        Resets the environment to its initial state for a new episode.
-        Returns:
-            tuple: A tuple containing the initial observation and auxiliary info.
-        """
         super().reset(seed=seed)
-        # Start the simulation at the first point where a full window of data is available.
         self._current_step = self.window_size
         self._portfolio_value = self.initial_balance
-        # Initialize weights to be 100% in cash.
         self._weights = np.zeros(self.n_assets + 1)
-        self._weights[-1] = 1.0  # Last element represents cash
         observation = self._get_obs()
         info = self._get_info()
         return observation, info
     def step(self, action):
-        """
-        Executes one time step within the environment based on the agent's action.
-        Args:
-            action (np.ndarray): The raw output from the agent's policy network.
-        Returns:
-            tuple: A tuple containing the next observation, reward, terminated flag,
-                   truncated flag, and auxiliary info.
-        """
-        # 1. Store the portfolio value before taking the action.
         current_portfolio_value = self._portfolio_value
-        # 2. Convert the raw action into portfolio weights using the softmax function.
-        # This ensures the weights are positive and sum to 1.
-        target_weights = np.exp(action) / np.sum(np.exp(action))
-        # 3. Calculate the cost of rebalancing the portfolio.
-        # The cost is based on the total value of assets bought or sold.
-        trades = (target_weights[:-1] - self._weights[:-1]) * current_portfolio_value
         transaction_costs = np.sum(np.abs(trades)) * self.transaction_cost_pct
-        # 4. Update the internal state: apply costs, set new weights, and advance time.
         self._balance = current_portfolio_value - transaction_costs
         self._weights = target_weights
         self._current_step += 1
-        # 5. Calculate the new portfolio value based on the market's price movement.
-        current_prices = self.df.iloc[self._current_step - 1].values
-        next_prices = self.df.iloc[self._current_step].values
-        price_ratio = next_prices / current_prices  # How much each asset's price changed.
-        # The new value of our asset holdings.
         asset_values_after_price_change = (self._weights[:-1] * self._balance) * price_ratio
-        # The new total portfolio value is the sum of the updated asset values plus the cash holding.
         new_portfolio_value = np.sum(asset_values_after_price_change) + (self._weights[-1] * self._balance)
         self._portfolio_value = new_portfolio_value
-        # 6. Calculate the reward for the agent.
-        # The reward is the log return of the portfolio value, which encourages geometric growth.
-        reward = np.log(new_portfolio_value / current_portfolio_value)
-        # 7. Check for termination conditions.
-        # The episode ends if the agent goes broke or runs out of data.
         terminated = bool(self._portfolio_value <= self.initial_balance * 0.5)
         truncated = self._current_step >= len(self.df) - 1
@@ -135,24 +100,25 @@ class PortfolioEnv(gym.Env):
     def _get_obs(self):
         """
-        Constructs the observation for the agent at the current time step.
-        Returns:
-            np.ndarray: A flattened 1D array of the normalized price history.
         """
-        # Get the window of historical price data.
-        price_window = self.df.iloc[self._current_step - self.window_size : self._current_step].values
-        # Normalize the window by dividing by the first price. This helps the agent
-        # focus on relative price changes rather than absolute values.
-        normalized_window = price_window / price_window[0]
-        return normalized_window.flatten().astype(np.float32)
     def _get_info(self):
-        """
-        Returns a dictionary of auxiliary information about the current state.
-        """
         return {
             'step': self._current_step,
             'portfolio_value': self._portfolio_value,
@@ -160,15 +126,7 @@ class PortfolioEnv(gym.Env):
         }
     def render(self, mode='human'):
-        """
-        Renders the environment's state (optional).
-        """
-        if mode == 'human':
-            info = self._get_info()
-            print(f"Step: {info['step']}, Portfolio Value: {info['portfolio_value']:.2f}")
     def close(self):
-        """
-        Cleans up the environment (optional).
-        """
         pass

+# src/environment.py (This is the CORRECT version for 8 features)
 import gymnasium as gym
 import numpy as np
 import pandas as pd
 class PortfolioEnv(gym.Env):
     """
+    A custom environment for portfolio management that includes macroeconomic data.
     """
     metadata = {'render_modes': ['human']}
     def __init__(self, df, window_size=30, initial_balance=10000, transaction_cost_pct=0.001):
         super(PortfolioEnv, self).__init__()
+        # --- Data Handling ---
         self.df = df
         self.window_size = window_size
         self.initial_balance = initial_balance
         self.transaction_cost_pct = transaction_cost_pct
+        # --- IMPORTANT: Define asset and macro columns ---
+        self.asset_columns = ['AAPL', 'BTC-USD', 'MSFT', 'SPY', 'TLT']
+        self.macro_columns = ['Federal Funds Rate', 'CPI', 'VIX']
+        self.n_assets = len(self.asset_columns)
+        self.n_macro_features = len(self.macro_columns)
+        # --- This is the attribute that was missing ---
+        self.n_features_per_step = self.n_assets + self.n_macro_features # Should be 8
         # --- Action Space ---
         self.action_space = spaces.Box(
             low=-1, high=1, shape=(self.n_assets + 1,), dtype=np.float32
         )
         # --- Observation Space ---
+        # Shape: (window_size * total_features) = (30 * 8) = 240
         self.observation_space = spaces.Box(
             low=-np.inf, high=np.inf,
+            shape=(self.window_size * self.n_features_per_step,),
             dtype=np.float32
         )
+        # --- Internal State ---
         self._current_step = 0
+        self._portfolio_value = 0
         self._weights = np.zeros(self.n_assets + 1)
+        # Separate dataframes for prices and macro for easier handling
+        self.price_df = self.df[self.asset_columns]
+        self.macro_df = self.df[self.macro_columns]
     def reset(self, seed=None):
         super().reset(seed=seed)
         self._current_step = self.window_size
         self._portfolio_value = self.initial_balance
         self._weights = np.zeros(self.n_assets + 1)
+        self._weights[-1] = 1.0 # 100% in cash
         observation = self._get_obs()
         info = self._get_info()
         return observation, info
     def step(self, action):
         current_portfolio_value = self._portfolio_value
+        target_weights = np.exp(action) / np.sum(np.exp(action)) # Softmax
+        current_asset_values = self._weights[:-1] * current_portfolio_value
+        target_asset_values = target_weights[:-1] * current_portfolio_value
+        trades = target_asset_values - current_asset_values
         transaction_costs = np.sum(np.abs(trades)) * self.transaction_cost_pct
         self._balance = current_portfolio_value - transaction_costs
         self._weights = target_weights
         self._current_step += 1
+        current_prices = self.price_df.iloc[self._current_step - 1].values
+        next_prices = self.price_df.iloc[self._current_step].values
+        price_ratio = next_prices / (current_prices + 1e-8) # Add epsilon for safety
         asset_values_after_price_change = (self._weights[:-1] * self._balance) * price_ratio
         new_portfolio_value = np.sum(asset_values_after_price_change) + (self._weights[-1] * self._balance)
         self._portfolio_value = new_portfolio_value
+        reward = np.log(new_portfolio_value / (current_portfolio_value + 1e-8)) # Add epsilon
         terminated = bool(self._portfolio_value <= self.initial_balance * 0.5)
         truncated = self._current_step >= len(self.df) - 1
     def _get_obs(self):
         """
+        Gets the observation for the current time step.
+        This includes a window of prices AND a window of macro data.
         """
+        price_window = self.price_df.iloc[self._current_step - self.window_size : self._current_step].values
+        macro_window = self.macro_df.iloc[self._current_step - self.window_size : self._current_step].values
+        # Normalize the price window (relative changes)
+        normalized_price_window = price_window / (price_window[0] + 1e-8)
+        # Normalize the macro window
+        normalized_macro_window = macro_window / (macro_window[0] + 1e-8)
+        # Combine the normalized windows
+        observation_window = np.concatenate([normalized_price_window, normalized_macro_window], axis=1)
+        # Flatten into a 1D vector
+        return observation_window.flatten().astype(np.float32)
     def _get_info(self):
         return {
             'step': self._current_step,
             'portfolio_value': self._portfolio_value,
         }
     def render(self, mode='human'):
+        pass
     def close(self):
         pass

scripts/evaluate.py CHANGED Viewed

@@ -1,12 +1,16 @@
 import pandas as pd
 import numpy as np
 import matplotlib.pyplot as plt
-from stable_baselines3 import SAC ,PPO , TD3
-from evaluate_baselines import buy_and_hold
-from environment import PortfolioEnv
 from matplotlib.ticker import FuncFormatter
-# --- Helper Function to Run the RL Agent ---
 def evaluate_agent(env, model):
     """
@@ -14,7 +18,6 @@ def evaluate_agent(env, model):
     """
     obs, info = env.reset()
     terminated, truncated = False, False
     portfolio_values = [env.initial_balance]
     while not (terminated or truncated):
@@ -22,121 +25,129 @@ def evaluate_agent(env, model):
         obs, reward, terminated, truncated, info = env.step(action)
         portfolio_values.append(info['portfolio_value'])
-    return pd.Series(portfolio_values, index=env.df.index[:len(portfolio_values)])
 def calculate_metrics(portfolio_values, freq=252, rf=0.0):
     """
     Calculates key performance metrics from a series of portfolio values.
-    freq: number of trading periods in a year (252 for daily, 52 for weekly).
-    rf: risk-free rate (default = 0 for simplicity).
     """
     returns = portfolio_values.pct_change().dropna()
-    # Total Return
     total_return = (portfolio_values.iloc[-1] / portfolio_values.iloc[0]) - 1
-    # CAGR
-    num_years = (len(portfolio_values) / freq)
-    cagr = (portfolio_values.iloc[-1] / portfolio_values.iloc[0]) ** (1/num_years) - 1
-    # Sharpe Ratio
-    sharpe_ratio = np.sqrt(freq) * (returns.mean() - rf) / returns.std()
-    # Sortino Ratio (downside risk only)
     downside_returns = returns[returns < 0]
     downside_std = downside_returns.std()
     sortino_ratio = np.sqrt(freq) * (returns.mean() - rf) / downside_std if downside_std > 0 else np.nan
-    # Volatility (annualized std)
     volatility = returns.std() * np.sqrt(freq)
-    # Max Drawdown
     rolling_max = portfolio_values.cummax()
     drawdown = portfolio_values / rolling_max - 1.0
     max_drawdown = drawdown.min()
-    # Calmar Ratio
-    calmar_ratio = cagr / abs(max_drawdown / 100) if max_drawdown != 0 else np.nan
     return {
-        "Total Return": f"{total_return:.2%}",
-        "CAGR": f"{cagr:.2%}",
-        "Sharpe Ratio": f"{sharpe_ratio:.2f}",
-        "Sortino Ratio": f"{sortino_ratio:.2f}",
-        "Volatility": f"{volatility:.2%}",
-        "Max Drawdown": f"{max_drawdown:.2%}",
         "Calmar Ratio": f"{calmar_ratio:.2f}"
     }
-def main(test_data_path='data/test.csv'):
     """
-    Loads, evaluates, and plots the performance of PPO, SAC, and TD3 agents
-    against a Buy and Hold baseline.
     """
-    # --- Define Model Paths and Agent Types ---
     models_to_evaluate = {
-        "PPO Agent": (PPO, 'checkpoints/ppo_portfolio_model'),
-        "SAC Agent": (SAC, 'checkpoints/sac_portfolio_model'),
-        "TD3 Agent": (TD3, 'checkpoints/td3_portfolio_model')
     }
-    # Load test data
-    test_df = pd.read_csv(test_data_path, index_col='Date', parse_dates=True)
-    # Dictionary to store results
     portfolio_values = {}
     metrics = {}
     # --- Run Evaluations for each RL Agent---
     for name, (agent_type, model_path) in models_to_evaluate.items():
         print(f"--- Evaluating {name} ---")
         model = agent_type.load(model_path)
-        env = PortfolioEnv(test_df)
         portfolio_values[name] = evaluate_agent(env, model)
         metrics[name] = calculate_metrics(portfolio_values[name])
     # --- Evaluate Buy and Hold Baseline ---
     print("\n--- Evaluating Buy and Hold Baseline ---")
-    bnh_values = buy_and_hold(test_df)
     portfolio_values["Buy and Hold"] = bnh_values
     metrics["Buy and Hold"] = calculate_metrics(bnh_values)
     # --- Combine and Print Metrics ---
     print("\n--- Performance Metrics ---")
     metrics_df = pd.DataFrame(metrics)
-    print(metrics_df)
     # --- Plotting All Strategies ---
     plt.style.use('seaborn-v0_8-darkgrid')
     fig, ax = plt.subplots(figsize=(14, 8))
-    # Define colors for clarity
     colors = {
-        "PPO Agent": "red",
-        "SAC Agent": "green",
-        "TD3 Agent": "orange",
-        "Buy and Hold": "blue"
     }
     for name, values in portfolio_values.items():
-        ax.plot(values.index, values, label=name, color=colors[name], linewidth=2)
     ax.set_title('Agent Performance Comparison', fontsize=16)
     ax.set_xlabel('Date', fontsize=12)
     ax.set_ylabel('Portfolio Value ($)', fontsize=12)
     ax.legend(fontsize=12)
     formatter = FuncFormatter(lambda x, p: f'${x:,.0f}')
     ax.yaxis.set_major_formatter(formatter)
     plt.tight_layout()
-    plt.savefig('results/final_performance_comparison_all_agents.png')
     plt.show()
-# Example of how to run this main function
 if __name__ == '__main__':
-    # You can specify a different test file here if needed
-    # e.g., main(test_data_path='data/stress_test_2018.csv')
     main()

+# scripts/compare_performance.py
 import pandas as pd
 import numpy as np
 import matplotlib.pyplot as plt
+import os
+from stable_baselines3 import TD3, PPO, SAC
+from gymnasium import spaces
 from matplotlib.ticker import FuncFormatter
+from environment import PortfolioEnv
+from evaluate_baselines import buy_and_hold, equally_weighted_rebalanced
+from custom_policy import TransformerFeatureExtractor
 def evaluate_agent(env, model):
     """
     """
     obs, info = env.reset()
     terminated, truncated = False, False
     portfolio_values = [env.initial_balance]
     while not (terminated or truncated):
         obs, reward, terminated, truncated, info = env.step(action)
         portfolio_values.append(info['portfolio_value'])
+    # Align index with the actual steps taken
+    # The first obs is at window_size, so index should start one step before
+    valid_dates = env.df.index[env.window_size-1:]
+    return pd.Series(portfolio_values, index=valid_dates[:len(portfolio_values)])
 def calculate_metrics(portfolio_values, freq=252, rf=0.0):
     """
     Calculates key performance metrics from a series of portfolio values.
     """
+    if len(portfolio_values) < 2:
+        return { "Total Return": "N/A", "CAGR": "N/A", "Sharpe Ratio": "N/A", "Max Drawdown": "N/A" }
     returns = portfolio_values.pct_change().dropna()
+    if returns.empty:
+        return { "Total Return": "0.00%", "CAGR": "0.00%", "Sharpe Ratio": "0.00", "Max Drawdown": "0.00%" }
     total_return = (portfolio_values.iloc[-1] / portfolio_values.iloc[0]) - 1
+    num_years = (len(portfolio_values) - 1) / freq
+    cagr = (portfolio_values.iloc[-1] / portfolio_values.iloc[0]) ** (1/num_years) - 1 if num_years > 0 else 0.0
+    sharpe_ratio = np.sqrt(freq) * (returns.mean() - rf) / returns.std() if returns.std() > 0 else np.nan
     downside_returns = returns[returns < 0]
     downside_std = downside_returns.std()
     sortino_ratio = np.sqrt(freq) * (returns.mean() - rf) / downside_std if downside_std > 0 else np.nan
     volatility = returns.std() * np.sqrt(freq)
     rolling_max = portfolio_values.cummax()
     drawdown = portfolio_values / rolling_max - 1.0
     max_drawdown = drawdown.min()
+    calmar_ratio = cagr / abs(max_drawdown) if max_drawdown != 0 and cagr != 0 else np.nan
     return {
+        "Total Return": f"{total_return:.2%}", "CAGR": f"{cagr:.2%}",
+        "Sharpe Ratio": f"{sharpe_ratio:.2f}", "Sortino Ratio": f"{sortino_ratio:.2f}",
+        "Volatility": f"{volatility:.2%}", "Max Drawdown": f"{max_drawdown:.2%}",
         "Calmar Ratio": f"{calmar_ratio:.2f}"
     }
+def main(test_data_path='data/eval.csv'):
     """
+    Loads, evaluates, and plots all agent performances against baselines.
     """
+    # Define Model Paths and Agent Types
     models_to_evaluate = {
+        "SAC Agent Default (MLP)": (SAC, 'checkpoints/sac_portfolio_model.zip'),
+        "PPO Agent (MLP)": (PPO, 'checkpoints/ppo_portfolio_model.zip'),
+        "TD3 Agent (MLP)": (TD3, 'checkpoints/td3_portfolio_model.zip'),
+        "TD3 Agent (Transformer)": (TD3, 'checkpoints/td3_transformer_model.zip')
     }
+    # Load test data (this contains ALL columns - assets + macro)
+    full_eval_df = pd.read_csv(test_data_path, index_col='Date', parse_dates=True)
+    # Define your actual tradable asset columns
+    asset_columns = ['AAPL', 'BTC-USD', 'MSFT', 'SPY', 'TLT']
     portfolio_values = {}
     metrics = {}
     # --- Run Evaluations for each RL Agent---
     for name, (agent_type, model_path) in models_to_evaluate.items():
         print(f"--- Evaluating {name} ---")
+        if not os.path.exists(model_path):
+            print(f"⚠️ Warning: Model file not found at {model_path}. Skipping.")
+            continue
         model = agent_type.load(model_path)
+        env = PortfolioEnv(full_eval_df) # Pass the full DataFrame to the RL env
         portfolio_values[name] = evaluate_agent(env, model)
         metrics[name] = calculate_metrics(portfolio_values[name])
     # --- Evaluate Buy and Hold Baseline ---
     print("\n--- Evaluating Buy and Hold Baseline ---")
+    bnh_values = buy_and_hold(full_eval_df[asset_columns])
+    ewp_values = equally_weighted_rebalanced(full_eval_df[asset_columns])
     portfolio_values["Buy and Hold"] = bnh_values
     metrics["Buy and Hold"] = calculate_metrics(bnh_values)
+    portfolio_values["Equally Weighted"] = ewp_values
+    metrics["Equally Weighted"] = calculate_metrics(ewp_values)
     # --- Combine and Print Metrics ---
     print("\n--- Performance Metrics ---")
     metrics_df = pd.DataFrame(metrics)
+    print(metrics_df.to_markdown(numalign="left", stralign="left"))
     # --- Plotting All Strategies ---
     plt.style.use('seaborn-v0_8-darkgrid')
     fig, ax = plt.subplots(figsize=(14, 8))
     colors = {
+        "PPO Agent (MLP)": "red",
+        "SAC Agent Default (MLP)": "green",
+        "TD3 Agent (MLP)": "orange",
+        "TD3 Agent (Transformer)": "cyan",
+        "Buy and Hold": "blue",
+        "Equally Weighted": "purple"
     }
     for name, values in portfolio_values.items():
+        if name in portfolio_values: # Check if it was successfully evaluated
+            ax.plot(values.index, values, label=name, color=colors.get(name, 'gray'), linewidth=2)
     ax.set_title('Agent Performance Comparison', fontsize=16)
     ax.set_xlabel('Date', fontsize=12)
     ax.set_ylabel('Portfolio Value ($)', fontsize=12)
     ax.legend(fontsize=12)
     formatter = FuncFormatter(lambda x, p: f'${x:,.0f}')
     ax.yaxis.set_major_formatter(formatter)
     plt.tight_layout()
+    results_dir = 'results'
+    os.makedirs(results_dir, exist_ok=True)
+    plt.savefig(os.path.join(results_dir, 'final_performance_comparison_all_agents.png'))
     plt.show()
 if __name__ == '__main__':
     main()

scripts/evaluate_baselines.py CHANGED Viewed

@@ -1,46 +1,46 @@
-# evaluate_baselines.py
 import pandas as pd
 import numpy as np
 import matplotlib.pyplot as plt
-def buy_and_hold(df, initial_balance=10000):
     """
     Simulates the Buy and Hold strategy.
     Args:
-        df (pd.DataFrame): DataFrame with daily asset prices.
         initial_balance (int): The starting capital.
     Returns:
         pd.Series: A Series containing the portfolio value for each day.
     """
     print("--- Simulating Buy and Hold ---")
-    n_assets = len(df.columns)
     # Invest an equal amount in each asset at the beginning
     initial_investment_per_asset = initial_balance / n_assets
     # Get the initial prices
-    initial_prices = df.iloc[0]
     # Calculate the number of shares bought for each asset
-    shares = initial_investment_per_asset / initial_prices
     # Calculate the portfolio value for each day
-    portfolio_values = df.dot(shares)
     print(f"Initial Investment: ${initial_balance:.2f}")
-    print(f"Final Portfolio Value: ${portfolio_values.iloc[-1]:.2f}")
     return portfolio_values
-def equally_weighted_rebalanced(df, initial_balance=10000, rebalance_freq='M', transaction_cost_pct=0.001):
     """
     Simulates an Equally Weighted Portfolio with periodic rebalancing.
     Args:
-        df (pd.DataFrame): DataFrame with daily asset prices.
         initial_balance (int): The starting capital.
         rebalance_freq (str): The rebalancing frequency ('M' for monthly, 'Q' for quarterly).
         transaction_cost_pct (float): The transaction cost as a percentage.
@@ -49,24 +49,30 @@ def equally_weighted_rebalanced(df, initial_balance=10000, rebalance_freq='M', t
         pd.Series: A Series containing the portfolio value for each day.
     """
     print(f"\n--- Simulating Equally Weighted Portfolio (Rebalanced {rebalance_freq}) ---")
-    n_assets = len(df.columns)
     # Set the initial weights to be equal
     weights = np.full(n_assets, 1/n_assets)
     portfolio_value = initial_balance
-    portfolio_values = pd.Series(index=df.index)
     last_rebalance_date = None
-    for date, prices in df.iterrows():
         # Store the portfolio value for the day before any changes
         portfolio_values[date] = portfolio_value
         # Determine if it's a rebalancing day
-        # Rebalance on the first day of the new period (month, quarter)
-        if last_rebalance_date is None or (date.month != last_rebalance_date.month and rebalance_freq == 'M'):
             # Calculate the value of trades to rebalance
             target_asset_values = portfolio_value * (1/n_assets)
             current_asset_values = weights * portfolio_value
@@ -82,32 +88,43 @@ def equally_weighted_rebalanced(df, initial_balance=10000, rebalance_freq='M', t
         # Calculate portfolio value for the *next* day before the market opens
         # Get price changes from today to the next trading day
-        today_prices = df.loc[date]
-        next_day_index = df.index.get_loc(date) + 1
-        if next_day_index < len(df):
-            next_day_prices = df.iloc[next_day_index]
-            price_change_ratio = next_day_prices / today_prices
             # Update portfolio value based on price changes
             portfolio_value = np.sum( (weights * portfolio_value) * price_change_ratio )
             # Update weights due to market drift
             new_asset_values = (weights * portfolio_value) * price_change_ratio
-            weights = new_asset_values / np.sum(new_asset_values)
     print(f"Initial Investment: ${initial_balance:.2f}")
-    print(f"Final Portfolio Value: ${portfolio_values.iloc[-1]:.2f}")
     return portfolio_values.dropna()
 def main():
-    # Load the test data
-    test_df = pd.read_csv('data/test.csv', index_col='Date', parse_dates=True)
     # --- Run Baseline Strategies ---
-    bnh_values = buy_and_hold(test_df)
-    ewp_values = equally_weighted_rebalanced(test_df)
     # --- Plot the results ---
     plt.style.use('seaborn-v0_8-darkgrid')
@@ -127,7 +144,11 @@ def main():
     ax.yaxis.set_major_formatter(formatter)
     plt.tight_layout()
-    plt.savefig('baseline_performance.png')
     plt.show()
 if __name__ == '__main__':

 import pandas as pd
 import numpy as np
 import matplotlib.pyplot as plt
+import os # Import os for directory creation
+def buy_and_hold(df_assets, initial_balance=10000): # Renamed df to df_assets for clarity
     """
     Simulates the Buy and Hold strategy.
     Args:
+        df_assets (pd.DataFrame): DataFrame with daily tradable asset prices ONLY.
         initial_balance (int): The starting capital.
     Returns:
         pd.Series: A Series containing the portfolio value for each day.
     """
     print("--- Simulating Buy and Hold ---")
+    n_assets = len(df_assets.columns)
     # Invest an equal amount in each asset at the beginning
     initial_investment_per_asset = initial_balance / n_assets
     # Get the initial prices
+    initial_prices = df_assets.iloc[0]
     # Calculate the number of shares bought for each asset
+    # Handle potential division by zero if an asset price is 0 (though unlikely with real data)
+    shares = initial_investment_per_asset / (initial_prices + 1e-8)
     # Calculate the portfolio value for each day
+    portfolio_values = df_assets.dot(shares)
     print(f"Initial Investment: ${initial_balance:.2f}")
+    print(f"Final Portfolio Value (Buy and Hold): ${portfolio_values.iloc[-1]:.2f}")
     return portfolio_values
+def equally_weighted_rebalanced(df_assets, initial_balance=10000, rebalance_freq='M', transaction_cost_pct=0.001): # Renamed df to df_assets
     """
     Simulates an Equally Weighted Portfolio with periodic rebalancing.
     Args:
+        df_assets (pd.DataFrame): DataFrame with daily tradable asset prices ONLY.
         initial_balance (int): The starting capital.
         rebalance_freq (str): The rebalancing frequency ('M' for monthly, 'Q' for quarterly).
         transaction_cost_pct (float): The transaction cost as a percentage.
         pd.Series: A Series containing the portfolio value for each day.
     """
     print(f"\n--- Simulating Equally Weighted Portfolio (Rebalanced {rebalance_freq}) ---")
+    n_assets = len(df_assets.columns)
     # Set the initial weights to be equal
     weights = np.full(n_assets, 1/n_assets)
     portfolio_value = initial_balance
+    portfolio_values = pd.Series(index=df_assets.index, dtype=float) # Explicitly set dtype
     last_rebalance_date = None
+    for i, (date, prices) in enumerate(df_assets.iterrows()):
         # Store the portfolio value for the day before any changes
         portfolio_values[date] = portfolio_value
         # Determine if it's a rebalancing day
+        # Rebalance on the first day of the new period (month, quarter) or if it's the very first day
+        rebalance_this_day = False
+        if i == 0: # Rebalance on the very first day
+            rebalance_this_day = True
+        elif rebalance_freq == 'M' and date.month != df_assets.index[i-1].month:
+            rebalance_this_day = True
+        # Add 'Q' for quarterly if needed, similar logic
+        if rebalance_this_day:
             # Calculate the value of trades to rebalance
             target_asset_values = portfolio_value * (1/n_assets)
             current_asset_values = weights * portfolio_value
         # Calculate portfolio value for the *next* day before the market opens
         # Get price changes from today to the next trading day
+        today_prices = prices # Already have prices for the current date
+        next_day_index = df_assets.index.get_loc(date) + 1
+        if next_day_index < len(df_assets):
+            next_day_prices = df_assets.iloc[next_day_index]
+            # Avoid division by zero
+            price_change_ratio = next_day_prices / (today_prices + 1e-8)
             # Update portfolio value based on price changes
             portfolio_value = np.sum( (weights * portfolio_value) * price_change_ratio )
             # Update weights due to market drift
             new_asset_values = (weights * portfolio_value) * price_change_ratio
+            # Avoid division by zero for total portfolio value
+            if np.sum(new_asset_values) > 1e-8: # Check if total value is effectively non-zero
+                weights = new_asset_values / np.sum(new_asset_values)
+            else:
+                weights = np.full(n_assets, 1/n_assets) # Default to equal or handle as error
     print(f"Initial Investment: ${initial_balance:.2f}")
+    print(f"Final Portfolio Value (Equally Weighted): ${portfolio_values.iloc[-1]:.2f}")
     return portfolio_values.dropna()
 def main():
+    # Load the evaluation data (which contains both assets and macro data)
+    full_eval_df = pd.read_csv('data/eval.csv', index_col='Date', parse_dates=True)
+    # --- IMPORTANT: Filter ONLY asset columns for baselines ---
+    asset_columns = ['AAPL', 'BTC-USD', 'MSFT', 'SPY', 'TLT'] # Define your actual tradable assets
+    test_df_assets_only = full_eval_df[asset_columns]
     # --- Run Baseline Strategies ---
+    bnh_values = buy_and_hold1(test_df_assets_only)
+    ewp_values = equally_weighted_rebalanced(test_df_assets_only)
     # --- Plot the results ---
     plt.style.use('seaborn-v0_8-darkgrid')
     ax.yaxis.set_major_formatter(formatter)
     plt.tight_layout()
+    # Ensure results directory exists for saving plot
+    results_dir = 'results'
+    os.makedirs(results_dir, exist_ok=True)
+    plt.savefig(os.path.join(results_dir, 'baseline_performance.png'))
     plt.show()
 if __name__ == '__main__':

scripts/fetch_data.py DELETED Viewed

@@ -1,75 +0,0 @@
-import yfinance as yf
-import pandas as pd
-import os
-# --- Configuration ---
-# Asset tickers
-TICKERS = ["AAPL", "MSFT", "SPY", "TLT", "BTC-USD"]
-# Time periods for training and testing
-TRAIN_START_DATE = "2015-01-01"
-TRAIN_END_DATE = "2020-12-31"
-TEST_START_DATE = "2021-01-01"
-TEST_END_DATE = "2023-12-31"
-# Directory to save the data
-DATA_DIR = "data"
-TRAIN_DATA_PATH = os.path.join(DATA_DIR, "train.csv")
-TEST_DATA_PATH = os.path.join(DATA_DIR, "test.csv")
-# --- Data Fetching and Processing ---
-def fetch_and_prepare_data(start_date, end_date, tickers):
-    """
-    Fetches historical data for the given tickers and processes it.
-    Returns a DataFrame with 'Close' prices for each ticker.
-    """
-    print(f"Fetching data from {start_date} to {end_date} for {tickers}...")
-    data = yf.download(tickers, start=start_date, end=end_date)
-    # CHANGE: Add .copy() to explicitly create a new DataFrame and avoid warnings.
-    close_data = data['Close'].copy()
-    print("\nData Head:")
-    print(close_data.head())
-    print("\nMissing values before cleaning:")
-    print(close_data.isnull().sum())
-    # Now, all inplace operations are safely performed on our own copy.
-    close_data.ffill(inplace=True)
-    close_data.bfill(inplace=True)
-    print("\nMissing values after cleaning:")
-    print(close_data.isnull().sum())
-    for col in close_data.columns:
-        close_data[col] = pd.to_numeric(close_data[col], errors='coerce')
-    close_data.dropna(inplace=True)
-    return close_data
-def main():
-    """Main function to run the data fetching process."""
-    # Create data directory if it doesn't exist
-    if not os.path.exists(DATA_DIR):
-        os.makedirs(DATA_DIR)
-        print(f"Created directory: {DATA_DIR}")
-    # Fetch, process, and save training data
-    print("--- Preparing Training Data ---")
-    train_data = fetch_and_prepare_data(TRAIN_START_DATE, TRAIN_END_DATE, TICKERS)
-    train_data.to_csv(TRAIN_DATA_PATH)
-    print(f"Training data saved to {TRAIN_DATA_PATH}")
-    print("\n" + "="*50 + "\n")
-    # Fetch, process, and save testing data
-    print("--- Preparing Testing Data ---")
-    test_data = fetch_and_prepare_data(TEST_START_DATE, TEST_END_DATE, TICKERS)
-    test_data.to_csv(TEST_DATA_PATH)
-    print(f"Testing data saved to {TEST_DATA_PATH}")
-if __name__ == "__main__":
-    main()

scripts/fetch_market_data.py CHANGED Viewed

@@ -1,78 +1,104 @@
 import argparse
 import os
-import pandas as pd
-import yfinance as yf
-from datetime import date
-def fetch_data(start_date, end_date, output_filename):
-    """
-    Fetches, cleans, and saves historical market data for a given date range.
-    Args:
-        start_date (str): The start date for the data in 'YYYY-MM-DD' format.
-        end_date (str): The end date for the data in 'YYYY-MM-DD' format.
-        output_filename (str): The path and name of the file to save the data.
     """
-    print(f"--- Fetching data from {start_date} to {end_date} ---")
-    # Define the base list of tickers
-    tickers = ["AAPL", "MSFT", "SPY", "TLT", "BTC-USD"]
-    # Smartly remove Bitcoin if the period is before its existence (e.g., before 2013)
-    if pd.to_datetime(start_date).year < 2013:
-        print("Note: Bitcoin (BTC-USD) did not exist for the requested period and will be excluded.")
-        tickers.remove("BTC-USD")
-    # Download data from Yahoo Finance
-    data = yf.download(tickers, start=start_date, end=end_date)
-    close_data = data['Close'].copy()
-    # Data Cleaning
-    print(f"\nMissing values before cleaning:\n{close_data.isnull().sum()}")
-    close_data.ffill(inplace=True)
-    close_data.bfill(inplace=True)
-    # Drop any columns that are still all NaN (like BTC in the 2008 data)
-    close_data.dropna(axis=1, how='all', inplace=True)
-    print(f"\nMissing values after cleaning:\n{close_data.isnull().sum()}")
-    # Ensure data directory exists
-    output_dir = os.path.dirname(output_filename)
-    if output_dir and not os.path.exists(output_dir):
-        os.makedirs(output_dir)
-    # Save to CSV
-    close_data.to_csv(output_filename)
-    print(f"\n✅ Data successfully saved to {output_filename}")
-if __name__ == "__main__":
-    # Set up command-line argument parsing
-    parser = argparse.ArgumentParser(description="Fetch historical market data for specified periods.")
-    parser.add_argument(
-        "--start",
-        type=str,
-        default="2018-01-01",
-        help="Start date in YYYY-MM-DD format. Default is for the 2018 stress test."
-    )
-    parser.add_argument(
-        "--end",
-        type=str,
-        default="2019-12-31",
-        help="End date in YYYY-MM-DD format. Default is for the 2018 stress test."
-    )
-    parser.add_argument(
-        "--filename",
-        type=str,
-        default="data/stress_test_2018.csv",
-        help="Output file name (e.g., 'data/my_data.csv')."
-    )
-    args = parser.parse_args()
-    # Use 'today' as the end date if specified
-    end_date = date.today().strftime('%Y-%m-%d') if args.end.lower() == 'today' else args.end
-    fetch_data(start_date=args.start, end_date=end_date, output_filename=args.filename)

+# scripts/fetch_market_data.py
+import yfinance as yf_lib
+import pandas as pd
 import argparse
 import os
+from datetime import datetime, timedelta
+from pandas_datareader import data as pdr
+# --- MOVE THESE OUTSIDE THE FUNCTION ---
+# Define your assets (Global variable, importable)
+ASSETS = ['AAPL', 'MSFT', 'SPY', 'TLT', 'BTC-USD']
+# Define FRED IDs for macroeconomic data (Global variable, importable)
+FRED_IDS = {
+    'DFF': 'Federal Funds Rate', # Daily Federal Funds Rate
+    'CPIAUCSL': 'CPI',           # Consumer Price Index (All Urban Consumers, Seasonally Adjusted, Monthly)
+    'VIXCLS': 'VIX'              # CBOE Volatility Index (VIX) from FRED
+}
+# ---------------------------------------
+def fetch_market_data(start_date, end_date, filename):
     """
+    Fetches market data, macroeconomic indicators (including VIX from FRED),
+    for specified assets and time period, then saves it to a CSV file.
+    """
+    # No need to re-define assets and fred_ids here.
+    # The function will use the global ASSETS and FRED_IDS defined above.
+    print(f"--- Fetching market data for {ASSETS} from {start_date} to {end_date} ---")
+    # 1. Fetch Asset Prices (Daily) using yf_lib.download()
+    try:
+        # Use the global ASSETS variable
+        df_prices = yf_lib.download(ASSETS, start=start_date, end=end_date)['Close']
+        df_prices.dropna(inplace=True)
+        print(f"✅ Fetched {len(ASSETS)} asset prices.")
+    except Exception as e:
+        print(f"❌ Error fetching asset prices: {e}")
+        return None # Return None on failure
+    # 2. Fetch Macro Data (VIX, Federal Funds Rate, CPI) from FRED using pandas_datareader
+    print("--- Fetching macroeconomic data from FRED ---")
+    try:
+        # FRED data can be tricky with exact date ranges, fetching a bit more to ensure coverage
+        fred_start_date = (datetime.strptime(start_date, '%Y-%m-%d') - timedelta(days=365)).strftime('%Y-%m-%d')
+        # Use the global FRED_IDS variable
+        df_fred = pdr.DataReader(list(FRED_IDS.keys()), 'fred', start=fred_start_date, end=end_date)
+        df_fred.rename(columns=FRED_IDS, inplace=True)
+        print("✅ Fetched Federal Funds Rate, CPI, and VIX data from FRED.")
+    except Exception as e:
+        print(f"❌ Error fetching FRED data: {e}. Check FRED API access or ticker validity.")
+        df_fred = pd.DataFrame() # Create empty dataframe if fetch fails
+    # Combine all dataframes
+    df_combined = df_prices.copy()
+    # Merge FRED data (now includes VIX)
+    if not df_fred.empty:
+        df_combined = df_combined.merge(df_fred, left_index=True, right_index=True, how='left')
+    # Handle missing macro data: forward-fill and then back-fill for initial NaNs
+    # This loop now covers all FRED columns
+    # Use the global FRED_IDS variable
+    for col_name in FRED_IDS.values():
+        if col_name in df_combined.columns:
+            df_combined[col_name] = df_combined[col_name].ffill().bfill()
+            # Drop rows if they still have NaN for macro data after fill
+            df_combined.dropna(subset=[col_name], inplace=True) # Added dropna for robustness
+    # Ensure all data is within the requested date range after merging/filling
+    df_combined = df_combined.loc[start_date:end_date]
+    df_combined.dropna(inplace=True) # Final dropna for any remaining NaNs
+    if df_combined.empty:
+        print("❌ Final combined dataframe is empty after merging and cleaning. Check date ranges and data availability.")
+        return None # Return None on failure
+    # Save to CSV if a filename is provided
+    if filename:
+        output_dir = os.path.dirname(filename)
+        if output_dir and not os.path.dirname(filename) == "":
+            os.makedirs(output_dir, exist_ok=True)
+        df_combined.to_csv(filename, index=True)
+        print(f"\n✅ Data saved successfully to {filename}")
+    print(f"Final data shape: {df_combined.shape}")
+    print("Columns:", df_combined.columns.tolist())
+    return df_combined # Return the DataFrame
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser(description="Fetch market and macroeconomic data.")
+    parser.add_argument("--start", type=str, default="2015-01-01", help="Start date (YYYY-MM-DD).")
+    parser.add_argument("--end", type=str, default="2020-12-31", help="End date (YYYY-MM-DD).")
+    parser.add_argument("--filename", type=str, default="data/train.csv", help="Output CSV filename.")
+    args = parser.parse_args()
+    fetch_market_data(args.start, args.end, args.filename)

scripts/llm_analysis_rag.py ADDED Viewed

	@@ -0,0 +1,243 @@

+# scripts/llm_analysis_rag.py
+import gc
+import time
+import os
+import shutil
+import torch
+import pandas as pd
+import numpy as np
+import json
+import re
+from datetime import datetime, timedelta
+# LangChain components
+from langchain_community.vectorstores import Chroma
+from langchain_community.embeddings import HuggingFaceEmbeddings
+from langchain_huggingface import HuggingFacePipeline
+from langchain_classic.chains import RetrievalQA
+from langchain_classic.prompts import PromptTemplate
+from langchain_classic.docstore.document import Document
+from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
+# --- Configuration ---
+# HF_EMBEDDING_MODEL is no longer used in this reduced scope
+HF_GENERATION_MODEL = "Qwen/Qwen2.5-3B-Instruct"
+# Global variables
+llm_pipeline_hf_instance = None
+# --- Helper: Robust JSON Extractor ---
+def extract_clean_json(response_text):
+    """Robust JSON extractor handling Python booleans and Markdown."""
+    json_match = re.search(r'```json\s*(.*?)\s*```', response_text, re.DOTALL)
+    if json_match:
+        text_to_parse = json_match.group(1)
+    else:
+        start_idx = response_text.find('{')
+        end_idx = response_text.rfind('}')
+        if start_idx != -1 and end_idx != -1:
+            text_to_parse = response_text[start_idx:end_idx+1]
+        else:
+            # print(f"❌ PARSE ERROR: No JSON found: {response_text[:100]}...")
+            return None
+    text_to_parse = text_to_parse.replace(": True", ": true").replace(": False", ": false")
+    try:
+        return json.loads(text_to_parse)
+    except json.JSONDecodeError:
+        return None
+# --- Shared LLM Setup (Singleton Pattern) ---
+def setup_llm_pipeline():
+    global llm_pipeline_hf_instance
+    if llm_pipeline_hf_instance is None:
+        print(f"--- Loading Model: {HF_GENERATION_MODEL} ---")
+        tokenizer = AutoTokenizer.from_pretrained(HF_GENERATION_MODEL, trust_remote_code=True)
+        # 4-bit quantization config for efficient loading
+        bnb_config = BitsAndBytesConfig(
+            load_in_4bit=True, bnb_4bit_use_double_quant=True,
+            bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.bfloat16
+        )
+        model = AutoModelForCausalLM.from_pretrained(
+            HF_GENERATION_MODEL, trust_remote_code=True,
+            quantization_config=bnb_config, device_map="auto"
+        )
+        # Create the HF pipeline
+        pipe = pipeline(
+            "text-generation", model=model, tokenizer=tokenizer,
+            max_new_tokens=1024, # Increased token limit for detailed historical analysis
+            do_sample=False, temperature=0.1, # Low temp for factual responses
+            return_full_text=False
+        )
+        llm_pipeline_hf_instance = HuggingFacePipeline(pipeline=pipe)
+    return llm_pipeline_hf_instance
+# =========================================
+# NEW FUNCTION: Structured Historical Analysis
+# =========================================
+def analyze_historical_segment(df_segment, selected_assets, period_name):
+    """
+    Analyzes a specific segment of historical data directly without RAG.
+    Takes a DataFrame slice, calculates summary stats, and prompts the LLM.
+    """
+    llm = setup_llm_pipeline()
+    print(f"--- Running Historical Analysis for {period_name} ---")
+    # 1. Create quantitative summary of the data segment for the prompt
+    if df_segment.empty:
+         return "No data available for this period to analyze."
+    start_date = df_segment.index.min().date()
+    end_date = df_segment.index.max().date()
+    start_vals = df_segment.iloc[0]
+    end_vals = df_segment.iloc[-1]
+    # Calculate percentage change over the period, handling potential zeros
+    pct_changes = ((end_vals - start_vals) / (start_vals.replace(0, np.nan)) * 100).fillna(0)
+    # Build the context string
+    data_summary = f"Analysis Period: {period_name} ({start_date} to {end_date})\n\n"
+    data_summary += "Performance Summary over Period:\n"
+    for asset in selected_assets:
+        if asset in df_segment.columns:
+            change = pct_changes[asset]
+            direction = "gained" if change > 0 else "lost"
+            data_summary += f"- {asset}: {direction} {abs(change):.2f}%\n"
+    # Add volatility context (standard deviation of daily returns)
+    data_summary += "\nVolatility Context (Daily Return Std Dev):\n"
+    daily_rets = df_segment.pct_change()
+    std_devs = daily_rets.std() * 100
+    for asset in selected_assets:
+         if asset in std_devs.index:
+             data_summary += f"- {asset}: {std_devs[asset]:.2f}%\n"
+    # 2. Create the Prompt
+    # We use Qwen's chat template format (<|im_start|>...)
+    prompt_template = """<|im_start|>system
+You are a senior financial analyst. Your job is to analyze historical market data trends for selected assets over a specific time period.
+Provide a concise, professional, and insightful summary of the performance, key trends, and comparative movements based *only* on the provided data summary.
+Highlight significant gains, losses, or differences in volatility between the assets.
+### DATA CONTEXT:
+{data_summary}
+<|im_end|>
+<|im_start|>user
+Generate the historical analysis report.
+<|im_end|>
+<|im_start|>assistant
+"""
+    pt = PromptTemplate(template=prompt_template, input_variables=["data_summary"])
+    formatted_prompt = pt.format(data_summary=data_summary)
+    # 3. Invoke LLM
+    response = llm.invoke(formatted_prompt)
+    return response.strip()
+# --- Decision Analysis (Kept for Forecast Tab) ---
+def analyze_agent_decision(current_market_data_window, proposed_allocations):
+    """
+    HYBRID ANALYZER: Python does the math, LLM does the talking.
+    """
+    llm = setup_llm_pipeline()
+    # --- 1. PREPARE DATA ---
+    # (Logic remains the same as before...)
+    vix_level = current_market_data_window['VIX'].iloc[-1] if 'VIX' in current_market_data_window else 0
+    # Identify largest position
+    risky_assets = {k:v for k,v in proposed_allocations.items() if k not in ['Cash', 'TLT']}
+    if risky_assets:
+        max_asset = max(risky_assets, key=risky_assets.get)
+        max_val = risky_assets[max_asset] * 100
+    else:
+        max_asset = "None"
+        max_val = 0.0
+    safe_haven_pct = (proposed_allocations.get('Cash', 0) + proposed_allocations.get('TLT', 0)) * 100
+    # --- 2. PYTHON LOGIC CORE ---
+    trigger_safe_haven = safe_haven_pct > 80.0
+    trigger_crash_rule = vix_level > 20.0 and safe_haven_pct < 30.0
+    trigger_concentration = vix_level > 15.0 and max_val > 40.0
+    # Determine Verdict Programmatically
+    calculated_risk = "MODERATE"
+    reason_code = "Standard market conditions."
+    if trigger_safe_haven:
+        calculated_risk = "LOW"
+        reason_code = f"Safe Haven Exception triggered (Safe Assets: {safe_haven_pct:.1f}% > 80%)."
+    elif trigger_crash_rule:
+        calculated_risk = "HIGH"
+        reason_code = f"Crash Protocol triggered (VIX {vix_level:.1f} > 20 and Safe Haven < 30%)."
+    elif trigger_concentration:
+        calculated_risk = "HIGH"
+        reason_code = f"Concentration Rule triggered (VIX {vix_level:.1f} > 15 and {max_asset} > 40%)."
+    # --- 3. THE "NARRATOR" PROMPT ---
+    prompt_template = """<|im_start|>system
+You are a Senior Risk Analyst.
+The Quantitative Engine has already processed the data and determined the Risk Level.
+Your job is to summarize the strategy and explain the risk verdict to the user.
+### QUANTITATIVE ENGINE OUTPUT:
+- **Determined Risk Level:** {calculated_risk}
+- **Primary Logic Trigger:** {reason_code}
+### DATA CONTEXT:
+- VIX: {vix:.2f}
+- Largest Position: {max_asset} ({max_val:.1f}%)
+- Safe Haven Allocation: {safe_pct:.1f}%
+### INSTRUCTIONS:
+1. **Strategy Summary:** Describe the allocation style (e.g., "Aggressive Tech", "Defensive Cash").
+2. **Justification:** Explain the Risk Level using the "Primary Logic Trigger" provided above. Do not invent new math.
+Return ONLY raw JSON:
+{{
+    "strategy_summary": "string",
+    "risk_level": "{calculated_risk}",
+    "justification": "string",
+    "confidence_score": 10
+}}
+<|im_end|>
+<|im_start|>user
+Generate the report.
+<|im_end|>
+<|im_start|>assistant
+"""
+    pt = PromptTemplate(template=prompt_template, input_variables=["calculated_risk", "reason_code", "vix", "max_asset", "max_val", "safe_pct"])
+    formatted = pt.format(
+        calculated_risk=calculated_risk,
+        reason_code=reason_code,
+        vix=vix_level,
+        max_asset=max_asset,
+        max_val=max_val,
+        safe_pct=safe_haven_pct
+    )
+    res = llm.invoke(formatted)
+    return extract_clean_json(res)
+# --- MAIN (for testing) ---
+if __name__ == '__main__':
+    print("Running test...")
+    # Generate Dummy Data
+    dates = pd.date_range(start="2023-01-01", periods=180, freq='D')
+    df_dummy = pd.DataFrame({
+        'SPY': np.linspace(400, 450, 180) + np.random.normal(0, 5, 180),
+        'BTC-USD': np.linspace(30000, 40000, 180) + np.random.normal(0, 1000, 180),
+        'VIX': np.linspace(20, 15, 180)
+    }, index=dates)
+    # Test the new historical analysis function
+    selected = ['SPY', 'BTC-USD']
+    period = "6 Months"
+    print(f"\nTesting analysis for {selected} over {period}...")
+    analysis = analyze_historical_segment(df_dummy, selected, period)
+    print("\n--- LLM Analysis Output ---")
+    print(analysis)

scripts/predict_tomorrow.py ADDED Viewed

	@@ -0,0 +1,123 @@

+# scripts/predict_tomorrow.py
+import pandas as pd
+import numpy as np
+import os
+import sys
+from datetime import datetime, timedelta
+from stable_baselines3 import SAC
+# --- Imports ---
+try:
+    # Ensure we can find local scripts
+    sys.path.append(os.getcwd())
+except:
+    pass
+from fetch_market_data import fetch_market_data, ASSETS, FRED_IDS
+from llm_analysis_rag import analyze_agent_decision
+# --- Configuration ---
+MODEL_PATH = "checkpoints/sac_portfolio_model.zip"
+WINDOW_SIZE = 30
+MACRO_COLS = list(FRED_IDS.values()) # ['Federal Funds Rate', 'CPI', 'VIX']
+def get_latest_data_window(window_size=30):
+    """
+    Fetches live data and returns the last 'window_size' rows.
+    """
+    print("--- 🔄 Fetching Real-Time Data for Prediction ---")
+    # Fetch a buffer to ensure we have enough data after cleaning
+    lookback_days = window_size + 100
+    end_date = datetime.now().strftime('%Y-%m-%d')
+    start_date = (datetime.now() - timedelta(days=lookback_days)).strftime('%Y-%m-%d')
+    # We don't strictly need to save to a file for prediction, so filename=None
+    df = fetch_market_data(start_date, end_date, filename=None)
+    if df is None or len(df) < window_size:
+        print(f"❌ Not enough data fetched. Got {len(df) if df is not None else 0} rows, needed {window_size}.")
+        return None
+    # Return exactly the last N rows (Observation Window)
+    return df.iloc[-window_size:].copy()
+def prepare_observation(data_window):
+    """
+    Normalizes data: Window / First_Row_of_Window
+    """
+    # Extract specific columns to guarantee order
+    price_data = data_window[ASSETS].values
+    macro_data = data_window[MACRO_COLS].values
+    # Normalize
+    norm_prices = price_data / (price_data[0] + 1e-8)
+    norm_macro = macro_data / (macro_data[0] + 1e-8)
+    # Concatenate and flatten for MLP input
+    obs = np.concatenate([norm_prices, norm_macro], axis=1)
+    return obs.flatten().astype(np.float32)
+def get_allocations(action):
+    """Applies Softmax to convert raw action to weights"""
+    action = np.asarray(action).flatten()
+    exp_action = np.exp(action)
+    return exp_action / np.sum(exp_action)
+def main():
+    print(f"🚀 Prediction Job: {datetime.now().strftime('%Y-%m-%d')}")
+    # 1. Get Data
+    data_window = get_latest_data_window(WINDOW_SIZE)
+    if data_window is None: return
+    # 2. Prepare Obs
+    obs = prepare_observation(data_window)
+    # 3. Load MLP Model
+    if not os.path.exists(MODEL_PATH):
+        print(f"❌ Model not found at {MODEL_PATH}")
+        return
+    print(f"Loading MLP SAC model...")
+    model = SAC.load(MODEL_PATH)
+    # 4. Predict
+    action, _ = model.predict(obs, deterministic=True)
+    weights = get_allocations(action)
+    # 5. Format Allocations (THE FIX IS HERE)
+    allocations = {}
+    for i, asset in enumerate(ASSETS):
+        allocations[asset] = float(weights[i]) # Explicit float() cast
+    allocations['Cash'] = float(weights[-1])   # Explicit float() cast
+    # 6. Output Results
+    print("\n" + "="*40)
+    print(f"🤖 SAC MLP MODEL RECOMMENDATION")
+    print("="*40)
+    for asset, weight in allocations.items():
+        print(f"{asset:<10} : {weight:6.2%}")
+    print("="*40)
+    # 7. AI Risk Analyst
+    print("\n🧠 Running AI Risk Analysis...")
+    # Now this will work because all numbers are standard floats
+    analysis = analyze_agent_decision(data_window, allocations)
+    if isinstance(analysis, dict):
+        print(f"\nStrategy:      {analysis.get('strategy_summary')}")
+        print(f"Risk Level:    {analysis.get('risk_level')}")
+        print(f"Justification: {analysis.get('justification')}")
+        if analysis.get('risk_level') == 'High':
+             print("\n⛔ BLOCKING TRADE: High Risk detected by AI Guardrail.")
+        else:
+             print("\n✅ TRADE APPROVED.")
+    else:
+        print(analysis)
+if __name__ == "__main__":
+    main()

scripts/tune_sac.py ADDED Viewed

	@@ -0,0 +1,198 @@

+# scripts/tune_sac.py
+import os
+import sys
+import pandas as pd
+import numpy as np
+import optuna
+from stable_baselines3 import SAC
+from stable_baselines3.common.vec_env import DummyVecEnv  # Use DummyVecEnv
+from stable_baselines3.common.callbacks import EvalCallback
+from stable_baselines3.common.logger import configure
+from environment import PortfolioEnv
+# ==============================================================================
+# 1. Configuration & Data Loading
+# ==============================================================================
+TRAIN_DATA_PATH = 'data/train.csv'
+EVAL_DATA_PATH = 'data/eval.csv'
+OPTUNA_LOG_DIR = 'optuna_logs'
+CHECKPOINT_DIR = 'checkpoints/optuna_sac_trials'
+# Create directories if they don't exist
+os.makedirs(OPTUNA_LOG_DIR, exist_ok=True)
+os.makedirs(CHECKPOINT_DIR, exist_ok=True)
+# Load data once
+df_full_train = pd.read_csv(TRAIN_DATA_PATH, index_col='Date', parse_dates=True)
+df_eval = pd.read_csv(EVAL_DATA_PATH, index_col='Date', parse_dates=True)
+# Split df_full_train for tuning
+train_split_point = int(len(df_full_train) * 0.8)
+df_train_tune = df_full_train.iloc[:train_split_point]
+df_validation_tune = df_full_train.iloc[train_split_point:]
+print(f"Total training data points: {len(df_full_train)}")
+print(f"Optuna training data points: {len(df_train_tune)}")
+print(f"Optuna validation data points: {len(df_validation_tune)}")
+# ==============================================================================
+# 2. Environment Creation Helper
+# ==============================================================================
+def make_env(df, window_size=30, initial_balance=10000, transaction_cost_pct=0.001):
+    """
+    Helper function to create a PortfolioEnv instance.
+    """
+    def _init():
+        env = PortfolioEnv(
+            df=df,
+            initial_balance=initial_balance,
+            window_size=window_size,
+            transaction_cost_pct=transaction_cost_pct
+        )
+        return env
+    return _init
+# ==============================================================================
+# 3. Optuna Objective Function
+# ==============================================================================
+def objective(trial: optuna.Trial) -> float:
+    """
+    Objective function for Optuna to optimize hyperparameters for SAC.
+    """
+    # Hyperparameter search space
+    learning_rate = trial.suggest_float('learning_rate', 1e-5, 1e-3, log=True)
+    gamma = trial.suggest_float('gamma', 0.9, 0.999)
+    tau = trial.suggest_float('tau', 0.005, 0.02)
+    buffer_size = trial.suggest_int('buffer_size', 50000, 1000000, log=True)
+    batch_size = trial.suggest_categorical('batch_size', [64, 128, 256, 512])
+    ent_coef = trial.suggest_float('ent_coef', 0.001, 0.1, log=True) # Use log scale for ent_coef
+    # Network architecture
+    n_layers = trial.suggest_int('n_layers', 1, 3)
+    net_arch = []
+    for i in range(n_layers):
+        layer_size = trial.suggest_categorical(f'layer_size_{i}', [64, 128, 256])
+        net_arch.append(layer_size)
+    policy_kwargs = dict(net_arch=net_arch) # SAC uses shared network or separate [pi, qf]
+    # Create environments for this trial
+    train_env = DummyVecEnv([make_env(df_train_tune)])
+    eval_env = DummyVecEnv([make_env(df_validation_tune)])
+    # Set up logger for the trial
+    trial_log_path = os.path.join(OPTUNA_LOG_DIR, f"trial_{trial.number}")
+    new_logger = configure(trial_log_path, ["stdout", "csv", "tensorboard"])
+    # Create SAC model
+    model = SAC(
+        "MlpPolicy",
+        train_env,
+        learning_rate=learning_rate,
+        gamma=gamma,
+        tau=tau,
+        buffer_size=buffer_size,
+        batch_size=batch_size,
+        ent_coef=ent_coef, # Pass the sampled value
+        policy_kwargs=policy_kwargs,
+        verbose=0,
+        seed=42, # Use a fixed seed for reproducibility within a trial
+        tensorboard_log=OPTUNA_LOG_DIR
+    )
+    model.set_logger(new_logger)
+    # Callback for evaluation
+    eval_callback = EvalCallback(
+        eval_env,
+        best_model_save_path=os.path.join(CHECKPOINT_DIR, f"best_sac_trial_{trial.number}"),
+        log_path=trial_log_path,
+        eval_freq=5000,
+        deterministic=True,
+        render=False,
+        n_eval_episodes=1
+    )
+    try:
+        # Train for a set number of steps per trial
+        total_timesteps_per_trial = 50000
+        model.learn(total_timesteps=total_timesteps_per_trial, callback=eval_callback, progress_bar=False)
+        # Load the best model found during this trial's training
+        best_model_path = os.path.join(CHECKPOINT_DIR, f"best_sac_trial_{trial.number}", "best_model.zip")
+        if os.path.exists(best_model_path):
+            model = SAC.load(best_model_path, env=eval_env)
+        else:
+            print(f"Warning: No best model saved for trial {trial.number}, using last model.")
+        # --- Final evaluation on the validation set ---
+        obs = eval_env.reset()
+        portfolio_values = [eval_env.envs[0].initial_balance]
+        done = False
+        while not done:
+            action, _ = model.predict(obs, deterministic=True)
+            obs, reward, done, info = eval_env.step(action)
+            portfolio_values.append(info[0]['portfolio_value'])
+        final_portfolio_value = portfolio_values[-1]
+        initial_portfolio_value = portfolio_values[0]
+        total_return = (final_portfolio_value / initial_portfolio_value) - 1
+        print(f"Trial {trial.number} finished. Total Return on validation: {total_return:.4f}")
+    except Exception as e:
+        print(f"Trial {trial.number} failed due to: {e}")
+        return float('nan') # Optuna handles NaN as a failure
+    finally:
+        train_env.close()
+        eval_env.close()
+    return total_return # Optuna aims to maximize this metric
+# ==============================================================================
+# 4. Run Optuna Study
+# ==============================================================================
+if __name__ == '__main__':
+    study = optuna.create_study(
+        direction='maximize',
+        sampler=optuna.samplers.TPESampler(seed=42)
+    )
+    n_trials_to_run = 50
+    study.optimize(objective, n_trials=n_trials_to_run, n_jobs=1) # n_jobs=1 is safer for Colab
+    print("\n--- Optimization finished. ---")
+    print("Best trial:")
+    trial = study.best_trial
+    print(f"  Value: {trial.value:.4f}")
+    print("  Params: ")
+    for key, value in trial.params.items():
+        print(f"    {key}: {value}")
+    # Save the best parameters to a file
+    best_params = trial.params
+    with open('checkpoints/best_sac_params.txt', 'w') as f:
+        f.write(str(best_params))
+    print(f"\n✅ Best parameters saved to checkpoints/best_sac_params.txt")
+    # Plotting results
+    try:
+        import plotly
+        from optuna.visualization import plot_optimization_history, plot_param_importances
+        fig1 = plot_optimization_history(study)
+        fig1.show()
+        fig2 = plot_param_importances(study)
+        fig2.show()
+    except ImportError:
+        print("\nInstall plotly and kaleido to visualize Optuna results: !pip install plotly kaleido")

scripts/visualize_strategy.py DELETED Viewed

@@ -1,123 +0,0 @@
-import argparse
-import os
-import pandas as pd
-import numpy as np
-import matplotlib.pyplot as plt
-from matplotlib.ticker import FuncFormatter
-from stable_baselines3 import PPO, SAC, TD3
-from environment import PortfolioEnv
-def visualize_strategy(agent_name, checkpoint_path, datafile_path, output_path):
-    """
-    Loads a trained agent, runs a simulation, and plots its portfolio allocation strategy.
-    Args:
-        agent_name (str): The type of agent to load ('ppo', 'sac', 'td3').
-        checkpoint_path (str): The path to the saved model checkpoint file (.zip).
-        datafile_path (str): The path to the CSV market data for the simulation.
-        output_path (str): The path to save the output plot image.
-    """
-    print(f"--- Visualizing strategy for {agent_name.upper()} agent ---")
-    # 1. Define a mapping from agent names to their classes
-    AGENT_CLASSES = {
-        "ppo": PPO,
-        "sac": SAC,
-        "td3": TD3
-    }
-    agent_class = AGENT_CLASSES[agent_name.lower()]
-    # 2. Load Data and Model
-    try:
-        test_df = pd.read_csv(datafile_path, index_col='Date', parse_dates=True)
-        model = agent_class.load(checkpoint_path)
-    except FileNotFoundError as e:
-        print(f"❌ Error: Could not find a required file. {e}")
-        return
-    except Exception as e:
-        print(f"❌ An error occurred: {e}")
-        return
-    # 3. Create Environment and Run Simulation
-    env = PortfolioEnv(test_df)
-    obs, info = env.reset()
-    terminated, truncated = False, False
-    weights_history = [info['weights']]
-    while not (terminated or truncated):
-        action, _states = model.predict(obs, deterministic=True)
-        obs, reward, terminated, truncated, info = env.step(action)
-        weights_history.append(info['weights'])
-    print("✅ Simulation complete.")
-    # 4. Prepare Data for Plotting
-    weights_df = pd.DataFrame(weights_history)
-    asset_names = test_df.columns.tolist() + ['Cash']
-    weights_df.columns = asset_names
-    weights_df.index = test_df.index[:len(weights_df)]
-    # 5. Plotting the Stacked Area Chart
-    print("📊 Generating plot...")
-    plt.style.use('seaborn-v0_8-darkgrid')
-    fig, ax = plt.subplots(figsize=(15, 8))
-    ax.stackplot(weights_df.index, weights_df.T, labels=weights_df.columns, alpha=0.8)
-    ax.set_title(f'Agent Portfolio Allocation Over Time ({agent_name.upper()})', fontsize=16)
-    ax.set_xlabel('Date', fontsize=12)
-    ax.set_ylabel('Portfolio Allocation (%)', fontsize=12)
-    ax.legend(loc='upper left', fontsize=10)
-    formatter = FuncFormatter(lambda y, p: f'{y:.0%}')
-    ax.yaxis.set_major_formatter(formatter)
-    plt.tight_layout()
-    # Ensure output directory exists
-    output_dir = os.path.dirname(output_path)
-    if output_dir and not os.path.exists(output_dir):
-        os.makedirs(output_dir)
-    plt.savefig(output_path)
-    print(f"✅ Plot saved to {output_path}")
-    plt.show()
-if __name__ == "__main__":
-    # Set up command-line argument parsing
-    parser = argparse.ArgumentParser(description="Visualize a trained RL agent's portfolio allocation strategy.")
-    parser.add_argument(
-        "--agent",
-        type=str,
-        required=True,
-        choices=["ppo", "sac", "td3"],
-        help="The RL algorithm of the trained agent."
-    )
-    parser.add_argument(
-        "--checkpoint",
-        type=str,
-        required=True,
-        help="Path to the saved model checkpoint .zip file (e.g., 'td3_portfolio_model.zip')."
-    )
-    parser.add_argument(
-        "--datafile",
-        type=str,
-        default="data/test.csv",
-        help="Path to the market data CSV file to run the simulation on."
-    )
-    parser.add_argument(
-        "--output",
-        type=str,
-        default="results/agent_allocation.png",
-        help="Path to save the output plot image."
-    )
-    args = parser.parse_args()
-    visualize_strategy(
-        agent_name=args.agent,
-        checkpoint_path=args.checkpoint,
-        datafile_path=args.datafile,
-        output_path=args.output
-    )