Spaces:

DanielKiani
/

Portfolio-Optimization-with-Deep-Reinforcement-Learning

Sleeping

App Files Files Community

DanielKiani commited on Oct 4, 2025

Commit

7d2e753

0 Parent(s):

Initial commit

Browse files

Files changed (20) hide show

.gitattributes +1 -0
.gitignore +7 -0
README.md +215 -0
assets/banner.png +3 -0
requirements.txt +19 -0
results/baseline_results.png +3 -0
results/final_performance_comparison_all_agents.png +3 -0
results/ppo_portfolio_alocation.png +3 -0
results/sac_portfolio_alocation.png +3 -0
results/stress_test_comparison_2018.png +3 -0
results/td3_portfolio_alocation.png +3 -0
scripts/check_env.py +32 -0
scripts/environment.py +174 -0
scripts/evaluate.py +142 -0
scripts/evaluate_baselines.py +134 -0
scripts/fetch_data.py +75 -0
scripts/fetch_market_data.py +78 -0
scripts/stress_test.py +142 -0
scripts/train.py +77 -0
scripts/visualize_strategy.py +123 -0

.gitattributes ADDED Viewed

	@@ -0,0 +1 @@


1	+ *.png filter=lfs diff=lfs merge=lfs -text

.gitignore ADDED Viewed

	@@ -0,0 +1,7 @@

+__pycache__/
+*.pyc
+venv/
+.venv/
+.vscode/
+.idea/

README.md ADDED Viewed

	@@ -0,0 +1,215 @@

+![Banner](assets/banner.png)
+[![Python](https://img.shields.io/badge/Python-3.12.11-blue?logo=python)](https://www.python.org/)[![PyTorch](https://img.shields.io/badge/PyTorch-2.8-EE4C2C?logo=pytorch)](https://pytorch.org/)![Made with ML](https://img.shields.io/badge/Made%20with-ML-blueviolet?logo=openai)[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
+# 🤖 Portfolio Optimization with Deep Reinforcement Learning
+This project explores the use of Deep Reinforcement Learning to train autonomous agents for financial portfolio management. The goal was not just to create a single profitable agent, but to conduct a comparative study of different RL algorithms (PPO, SAC, TD3) to understand the emergent trading strategies and their robustness across various market conditions.
+**The ultimate finding? A TD3-based agent learned a superior, risk-managed static asset allocation that consistently outperformed both active trading strategies and aggressive growth models, especially during market downturns.**
+---
+## 📜 Table of Contents
+1. [📊 The Data & Asset Selection](#-the-data--asset-selection)
+2. [🎯 Benchmarking Against Baselines](#-benchmarking-against-baselines)
+3. [🏆 Key Findings & The Champion Agent](#-key-findings--the-champion-agent)
+4. [🧠 Comparative Analysis of Agent Strategies](#-comparative-analysis-of-agent-strategies)
+    * [🥇 TD3: The Prudent Risk-Manager](#-td3-the-prudent-risk-manager)
+    * [🚀 SAC: The Aggressive Growth Engine](#-sac-the-aggressive-growth-engine)
+    * [📈 PPO: The Active (but Inconsistent) Trader](#-ppo-the-active-but-inconsistent-trader)
+5. [🌪️ Stress Testing: The Ultimate Test of Robustness](#️-stress-testing-the-ultimate-test-of-robustness)
+6. [🔬 The Research Journey: Why Simplicity Won](#-the-research-journey-why-simplicity-won)
+7. [✅ Conclusion](#-conclusion)
+8. [📂 Project Structure](#-project-structure)
+9. [🚀 How to Run](#-how-to-run)
+    * [Setup](#setup)
+    * [Data Fetching](#data-fetching)
+    * [Training](#training)
+    * [Evaluation & Visualization](#evaluation--visualization)
+---
+## 📊 The Data & Asset Selection
+The foundation of any financial machine learning project is the data. This project uses daily closing price data sourced from **Yahoo Finance** via the `yfinance` library. The primary training period was **2015-2020**, with out-of-sample testing conducted on **2021-2023** and other periods for stress testing.
+The selection of assets was crucial for creating a realistic decision-making environment for the agent. The portfolio consists of five assets, chosen to represent different classes and risk profiles:
+* **Growth Equities (AAPL, MSFT):** Represent the high-growth, high-volatility technology sector.
+* **Market Index (SPY):** An ETF tracking the S&P 500, representing the broader US stock market.
+* **Safe Haven (TLT):** An ETF for 20+ Year US Treasury Bonds, which often acts as a "risk-off" asset during stock market downturns.
+* **Alternative Asset (BTC-USD):** Represents a non-traditional, extremely volatile asset class with high potential returns.
+This diverse mix forces the agent to learn not just about individual assets, but also about their correlations and how to balance risk across different economic regimes.
+---
+## 🎯 Benchmarking Against Baselines
+To prove that a reinforcement learning agent is truly "intelligent," its performance must be measured against simple, standard strategies. An agent is only successful if it can provide value beyond a naive approach.
+Our primary benchmark was the **Buy and Hold** strategy, where an equal amount of capital is invested in each asset at the beginning of the period and never touched again. The goal for any trained RL agent was to achieve superior performance, especially on a **risk-adjusted basis** (e.g., higher Sharpe Ratio, lower Max Drawdown), compared to this baseline.
+The chart below shows the performance of a simple Buy and Hold strategy during the 2021-2023 test period, setting a clear target for our agents to beat.
+![Baseline Performance Chart](results/baseline_results.png)
+---
+## 🏆 Key Findings & The Champion Agent
+After extensive training, evaluation, and stress-testing, the **TD3 agent emerged as the clear winner** on a risk-adjusted basis. While other agents achieved higher raw returns, their strategies proved to be brittle and dangerously volatile during market crises. The TD3 agent's strategy was the most robust and reliable.
+#### Final Performance Comparison (2021-2023)
+This table summarizes the performance of the top-performing static agents against the baseline.
+| Metric | **TD3 Agent** | SAC Agent | Buy & Hold |
+| :--- | :--- | :--- | :--- |
+| **Total Return** | 47.24% | **50.89%** | 34.91% |
+| **CAGR** | 13.76% | **14.70%** | 10.50% |
+| **Sharpe Ratio** | **0.62** | 0.51 | 0.45 |
+| **Max Drawdown** | **-28.41%** | -44.61% | -40.81% |
+The TD3 agent delivered strong returns while significantly reducing the maximum drawdown, proving its superior capital preservation strategy.
+![Main Performance Chart](results/final_performance_comparison_all_agents.png)
+---
+## 🧠 Comparative Analysis of Agent Strategies
+A fascinating outcome of this project was observing three different RL algorithms independently discover three distinct and recognizable investment philosophies.
+### 🥇 TD3: The Prudent Risk-Manager
+The TD3 agent concluded that the most effective strategy was not to trade frequently, but to find one **superior, risk-managed static asset allocation** and hold it.
+* **Strategy:** "Smarter Buy and Hold".
+* **Behavior:** The agent's allocation is completely static, indicating it focused on the initial strategic decision and ignored market noise to minimize transaction costs.
+* **Result:** This approach led to the best risk-adjusted returns, proving that a robust initial setup is more valuable than reactive trading.
+![TD3 Allocation Chart](results/td3_portfolio_alocation.png)
+### 🚀 SAC: The Aggressive Growth Engine
+The SAC agent also learned a static allocation strategy, but its portfolio was geared for **maximum growth**, accepting higher risk for higher potential returns.
+* **Strategy:** High-risk, high-return static allocation.
+* **Behavior:** Like TD3, it made one initial allocation and held firm. However, this allocation was far more aggressive.
+* **Result:** It achieved the highest total return in some periods but suffered catastrophic drawdowns in stress tests, making its strategy unreliable and brittle.
+![SAC Performance Chart](results/sac_portfolio_alocation.png)
+### 📈 PPO: The Active (but Inconsistent) Trader
+Unlike the other two, the PPO agent learned an **active, dynamic trading strategy**, constantly adjusting its portfolio based on market conditions.
+* **Strategy:** Tactical asset allocation.
+* **Behavior:** The allocation chart clearly shows the agent rebalancing its portfolio over time, for example, by increasing its bond (TLT) holdings during the 2022 downturn.
+* **Result:** While impressive that it learned this behavior, its performance was inconsistent. It succeeded in some periods (2018) but failed in others (2025), highlighting the immense difficulty of successful market timing.
+![PPO Allocation Chart](results/ppo_portfolio_alocation.png)
+---
+## 🌪️ Stress Testing: The Ultimate Test of Robustness
+A model is only as good as its performance during a crisis. We subjected the agents to multiple out-of-sample stress tests, with the 2018 period (featuring a crypto winter and a stock market flash crash) being the most revealing.
+![2018 Stress Test Chart](results/stress_test_comparison_2018.png)
+* **TD3's Triumph:** The orange line shows the TD3 agent successfully navigating the downturn, preserving capital far better than the baseline.
+* **SAC's Failure:** The green line shows the SAC agent's aggressive strategy failing catastrophically, resulting in a massive drawdown.
+This test definitively proved that the **TD3 agent's risk-managed approach was truly robust**, while the SAC agent's strategy was fragile.
+---
+## 🔬 The Research Journey: Why Simplicity Won
+This project was also an exercise in scientific methodology. We initially hypothesized that more complex models and features would yield better results.
+* **Hypothesis 1: More features are better.** We tested adding technical indicators (RSI, MACD) to the observation space. **Result:** Performance degraded. The indicators acted as noise, confusing the agents.
+* **Hypothesis 2: Models with memory are better.** We tested an LSTM-based agent (`RecurrentPPO`). **Result:** Performance degraded. The added complexity led to overfitting on the training data.
+* **Hypothesis 3: Using Regularization is better.** We tested both L1 and L2 regularization. **Results:** Performance degraded.
+* **Hypothesis 4: Increasing the window from 30 days is better.** We tested increasing the window to 60 days. **Results:** Performance degraded. increasing the context window is not always good and it could be seen as more noise for the model.
+The conclusion was clear: for this problem, a simple and elegant model (a standard MLP fed with just normalized price data) was the most effective.
+---
+## ✅ Conclusion
+This project successfully demonstrates that Deep Reinforcement Learning can be a powerful tool for discovering sophisticated investment strategies. The key insight is that the most robust and successful agent did not learn to be a hyperactive trader, but rather a prudent strategic allocator, emphasizing the timeless investment principle that effective risk management is the true key to long-term success.
+---
+## 📂 Project Structure
+The codebase is organized into modular, reusable scripts.
+```bash
+├── assets/
+├── checkpoints/            # Holds all saved model .zip files
+├── results/                # Holds all output plots and metrics
+├── scripts/
+│   ├── environment.py      # The custom Gymnasium environment for the simulation
+│   ├── fetch_market_data.py# A flexible script to download data for any period
+│   ├── train.py            # The main training script with model selection
+│   ├── evaluate.py         # The main evaluation script for generating metrics
+│   ├── stress_test.py      # Runs a full comparison of all agents on a given dataset
+│   └── visualize_strategy.py # Plots the asset allocation of a single trained agent
+└── README.md                 # This file
+```
+---
+## 🚀 How to Run
+### Setup
+1. Clone the repository.
+2. Create and activate a Python virtual environment.
+3. Install the required packages:
+    ```bash
+    pip install -r requirements.txt
+    ```
+### Data Fetching
+Use the flexible `fetch_market_data.py` script to get any data you need.
+```bash
+# Fetch the default training data (2015-2021)
+python fetch_market_data.py --start 2015-01-01 --end 2020-12-31 --filename data/train.csv
+# Fetch data for a stress test (e.g., 2022)
+python fetch_market_data.py --start 2022-01-01 --end 2022-12-31 --filename data/test_2022.csv
+```
+### Training
+Use the `train.py` script to train any of the three main agents.
+```bash
+# Train the champion TD3 agent (default)
+python src/train.py --agent td3
+# Train a SAC agent for more timesteps
+python src/train.py --agent sac --timesteps 100000
+```
+### Evaluation & Visualization
+Use the dedicated scripts to analyze the results.
+```bash
+# Run a full stress test on the 2018 data
+python stress_test.py --datafile data/stress_test_2018.csv
+# Visualize the TD3 agent's strategy
+python visualize_strategy.py --agent td3 --checkpoint td3_portfolio_model.zip
+```

assets/banner.png ADDED Viewed

Git LFS Details

SHA256: 6f3455d5f88a8eb82affe16263753b2ee5cfaa6c6adf2e55bf0b650e8f4701ab
Pointer size: 132 Bytes
Size of remote file: 1.67 MB

requirements.txt ADDED Viewed

	@@ -0,0 +1,19 @@

+# Core RL and Simulation
+stable-baselines3==2.7.0
+sb3_contrib==2.7.0
+gymnasium==1.2.1
+# Data Handling and Numerics
+pandas==2.3.3
+numpy==2.2.6
+scikit-learn==1.6.1
+# Data Fetching
+yfinance==0.2.66
+# Financial Indicators
+pandas-ta==0.4.71b0
+# Plotting and Visualization
+matplotlib==3.10.0
+seaborn==0.13.2

results/baseline_results.png ADDED Viewed

Git LFS Details

SHA256: 8ade77274352ad37706e9bb7076b225bc784ddbbd51ada1bb4b6b983a0eb9cf2
Pointer size: 131 Bytes
Size of remote file: 161 kB

results/final_performance_comparison_all_agents.png ADDED Viewed

Git LFS Details

SHA256: 75b085d93cd947906c2f6f5fdf4f1fbc0b53cdb56d2ad3a77011b6cae8c787ab
Pointer size: 131 Bytes
Size of remote file: 225 kB

results/ppo_portfolio_alocation.png ADDED Viewed

Git LFS Details

SHA256: 3e280bd69972c63cd3ff6cb4c3e0ee80d7e8858c7afc0f251164052e3be6323a
Pointer size: 130 Bytes
Size of remote file: 86.2 kB

results/sac_portfolio_alocation.png ADDED Viewed

Git LFS Details

SHA256: 8462e8dffc0a9dfe6562f89c529ea763e45779fb25c6e3bf97ac4aa39d669454
Pointer size: 130 Bytes
Size of remote file: 39.2 kB

results/stress_test_comparison_2018.png ADDED Viewed

Git LFS Details

SHA256: 0240c233639499468ded8d767f7ba1f9b0dea256807db8dbc36bcf1136ceb731
Pointer size: 131 Bytes
Size of remote file: 197 kB

results/td3_portfolio_alocation.png ADDED Viewed

Git LFS Details

SHA256: 142858e04e50d46a9eeff9a60c3fdf699c5ff337de88164b17865d1fa900fdfd
Pointer size: 130 Bytes
Size of remote file: 38.1 kB

scripts/check_env.py ADDED Viewed

	@@ -0,0 +1,32 @@

+import pandas as pd
+from stable_baselines3.common.env_checker import check_env
+from environment import PortfolioEnv
+def main():
+    """
+    Main function to create and check the custom portfolio environment.
+    """
+    print("--- Loading Data and Creating Environment ---")
+    try:
+        # Load your training data
+        df = pd.read_csv('data/train.csv', index_col='Date', parse_dates=True)
+        # Create an instance of your environment
+        env = PortfolioEnv(df)
+        print("Environment created successfully.")
+    except FileNotFoundError:
+        print("❌ Error: 'data/train.csv' not found. Make sure you've run the data fetching script.")
+        return
+    print("\n--- Checking Environment Compatibility ---")
+    try:
+        # The check_env function will raise an error if the environment is not compatible.
+        check_env(env)
+        print("✅ Environment check passed!")
+    except Exception as e:
+        print("❌ Environment check failed:")
+        # It's helpful to print the full traceback for debugging complex errors.
+        import traceback
+        traceback.print_exc()
+if __name__ == "__main__":
+    main()

scripts/environment.py ADDED Viewed

	@@ -0,0 +1,174 @@

+import gymnasium as gym
+import numpy as np
+import pandas as pd
+from gymnasium import spaces
+class PortfolioEnv(gym.Env):
+    """
+    A custom reinforcement learning environment for portfolio management.
+    This environment simulates the daily trading of multiple financial assets. The agent's
+    goal is to learn a policy for allocating capital to maximize risk-adjusted returns.
+    """
+    metadata = {'render_modes': ['human']}
+    def __init__(self, df, window_size=30, initial_balance=10000, transaction_cost_pct=0.001):
+        """
+        Initializes the portfolio management environment.
+        Args:
+            df (pd.DataFrame): A DataFrame containing the daily closing prices of the assets.
+                               The index should be dates and columns should be asset tickers.
+            window_size (int): The number of past days of price data to include in the observation.
+            initial_balance (float): The starting capital for the portfolio.
+            transaction_cost_pct (float): The percentage cost for each trade (e.g., 0.001 for 0.1%).
+        """
+        super(PortfolioEnv, self).__init__()
+        # --- Basic Environment Parameters ---
+        self.df = df
+        self.window_size = window_size
+        self.initial_balance = initial_balance
+        self.transaction_cost_pct = transaction_cost_pct
+        self.n_assets = len(df.columns)
+        # --- Action Space ---
+        # The agent outputs a vector of continuous values, one for each asset plus one for cash.
+        # These raw outputs are then converted to portfolio weights via a softmax function.
+        # The space is defined from -1 to 1 for better compatibility with standard RL algorithms.
+        # Shape: (number of assets + 1 for cash)
+        self.action_space = spaces.Box(
+            low=-1, high=1, shape=(self.n_assets + 1,), dtype=np.float32
+        )
+        # --- Observation Space ---
+        # The agent observes a window of past price data, flattened into a 1D vector.
+        # Shape: (window_size * number of assets)
+        self.observation_space = spaces.Box(
+            low=-np.inf, high=np.inf,
+            shape=(self.window_size * self.n_assets,),
+            dtype=np.float32
+        )
+        # --- Internal State Variables ---
+        # These variables track the state of the simulation over time.
+        self._current_step = 0
+        self._portfolio_value = 0.0
+        # Weights for each asset + cash, e.g., [w_aapl, w_msft, ..., w_cash]
+        self._weights = np.zeros(self.n_assets + 1)
+    def reset(self, seed=None):
+        """
+        Resets the environment to its initial state for a new episode.
+        Returns:
+            tuple: A tuple containing the initial observation and auxiliary info.
+        """
+        super().reset(seed=seed)
+        # Start the simulation at the first point where a full window of data is available.
+        self._current_step = self.window_size
+        self._portfolio_value = self.initial_balance
+        # Initialize weights to be 100% in cash.
+        self._weights = np.zeros(self.n_assets + 1)
+        self._weights[-1] = 1.0  # Last element represents cash
+        observation = self._get_obs()
+        info = self._get_info()
+        return observation, info
+    def step(self, action):
+        """
+        Executes one time step within the environment based on the agent's action.
+        Args:
+            action (np.ndarray): The raw output from the agent's policy network.
+        Returns:
+            tuple: A tuple containing the next observation, reward, terminated flag,
+                   truncated flag, and auxiliary info.
+        """
+        # 1. Store the portfolio value before taking the action.
+        current_portfolio_value = self._portfolio_value
+        # 2. Convert the raw action into portfolio weights using the softmax function.
+        # This ensures the weights are positive and sum to 1.
+        target_weights = np.exp(action) / np.sum(np.exp(action))
+        # 3. Calculate the cost of rebalancing the portfolio.
+        # The cost is based on the total value of assets bought or sold.
+        trades = (target_weights[:-1] - self._weights[:-1]) * current_portfolio_value
+        transaction_costs = np.sum(np.abs(trades)) * self.transaction_cost_pct
+        # 4. Update the internal state: apply costs, set new weights, and advance time.
+        self._balance = current_portfolio_value - transaction_costs
+        self._weights = target_weights
+        self._current_step += 1
+        # 5. Calculate the new portfolio value based on the market's price movement.
+        current_prices = self.df.iloc[self._current_step - 1].values
+        next_prices = self.df.iloc[self._current_step].values
+        price_ratio = next_prices / current_prices  # How much each asset's price changed.
+        # The new value of our asset holdings.
+        asset_values_after_price_change = (self._weights[:-1] * self._balance) * price_ratio
+        # The new total portfolio value is the sum of the updated asset values plus the cash holding.
+        new_portfolio_value = np.sum(asset_values_after_price_change) + (self._weights[-1] * self._balance)
+        self._portfolio_value = new_portfolio_value
+        # 6. Calculate the reward for the agent.
+        # The reward is the log return of the portfolio value, which encourages geometric growth.
+        reward = np.log(new_portfolio_value / current_portfolio_value)
+        # 7. Check for termination conditions.
+        # The episode ends if the agent goes broke or runs out of data.
+        terminated = bool(self._portfolio_value <= self.initial_balance * 0.5)
+        truncated = self._current_step >= len(self.df) - 1
+        observation = self._get_obs()
+        info = self._get_info()
+        return observation, reward, terminated, truncated, info
+    def _get_obs(self):
+        """
+        Constructs the observation for the agent at the current time step.
+        Returns:
+            np.ndarray: A flattened 1D array of the normalized price history.
+        """
+        # Get the window of historical price data.
+        price_window = self.df.iloc[self._current_step - self.window_size : self._current_step].values
+        # Normalize the window by dividing by the first price. This helps the agent
+        # focus on relative price changes rather than absolute values.
+        normalized_window = price_window / price_window[0]
+        return normalized_window.flatten().astype(np.float32)
+    def _get_info(self):
+        """
+        Returns a dictionary of auxiliary information about the current state.
+        """
+        return {
+            'step': self._current_step,
+            'portfolio_value': self._portfolio_value,
+            'weights': self._weights,
+        }
+    def render(self, mode='human'):
+        """
+        Renders the environment's state (optional).
+        """
+        if mode == 'human':
+            info = self._get_info()
+            print(f"Step: {info['step']}, Portfolio Value: {info['portfolio_value']:.2f}")
+    def close(self):
+        """
+        Cleans up the environment (optional).
+        """
+        pass

scripts/evaluate.py ADDED Viewed

	@@ -0,0 +1,142 @@

+import pandas as pd
+import numpy as np
+import matplotlib.pyplot as plt
+from stable_baselines3 import SAC ,PPO , TD3
+from evaluate_baselines import buy_and_hold
+from environment import PortfolioEnv
+from matplotlib.ticker import FuncFormatter
+# --- Helper Function to Run the RL Agent ---
+def evaluate_agent(env, model):
+    """
+    Runs the trained agent on the environment and returns portfolio values.
+    """
+    obs, info = env.reset()
+    terminated, truncated = False, False
+    portfolio_values = [env.initial_balance]
+    while not (terminated or truncated):
+        action, _states = model.predict(obs, deterministic=True)
+        obs, reward, terminated, truncated, info = env.step(action)
+        portfolio_values.append(info['portfolio_value'])
+    return pd.Series(portfolio_values, index=env.df.index[:len(portfolio_values)])
+def calculate_metrics(portfolio_values, freq=252, rf=0.0):
+    """
+    Calculates key performance metrics from a series of portfolio values.
+    freq: number of trading periods in a year (252 for daily, 52 for weekly).
+    rf: risk-free rate (default = 0 for simplicity).
+    """
+    returns = portfolio_values.pct_change().dropna()
+    # Total Return
+    total_return = (portfolio_values.iloc[-1] / portfolio_values.iloc[0]) - 1
+    # CAGR
+    num_years = (len(portfolio_values) / freq)
+    cagr = (portfolio_values.iloc[-1] / portfolio_values.iloc[0]) ** (1/num_years) - 1
+    # Sharpe Ratio
+    sharpe_ratio = np.sqrt(freq) * (returns.mean() - rf) / returns.std()
+    # Sortino Ratio (downside risk only)
+    downside_returns = returns[returns < 0]
+    downside_std = downside_returns.std()
+    sortino_ratio = np.sqrt(freq) * (returns.mean() - rf) / downside_std if downside_std > 0 else np.nan
+    # Volatility (annualized std)
+    volatility = returns.std() * np.sqrt(freq)
+    # Max Drawdown
+    rolling_max = portfolio_values.cummax()
+    drawdown = portfolio_values / rolling_max - 1.0
+    max_drawdown = drawdown.min()
+    # Calmar Ratio
+    calmar_ratio = cagr / abs(max_drawdown / 100) if max_drawdown != 0 else np.nan
+    return {
+        "Total Return": f"{total_return:.2%}",
+        "CAGR": f"{cagr:.2%}",
+        "Sharpe Ratio": f"{sharpe_ratio:.2f}",
+        "Sortino Ratio": f"{sortino_ratio:.2f}",
+        "Volatility": f"{volatility:.2%}",
+        "Max Drawdown": f"{max_drawdown:.2%}",
+        "Calmar Ratio": f"{calmar_ratio:.2f}"
+    }
+def main(test_data_path='data/test.csv'):
+    """
+    Loads, evaluates, and plots the performance of PPO, SAC, and TD3 agents
+    against a Buy and Hold baseline.
+    """
+    # --- Define Model Paths and Agent Types ---
+    models_to_evaluate = {
+        "PPO Agent": (PPO, 'checkpoints/ppo_portfolio_model'),
+        "SAC Agent": (SAC, 'checkpoints/sac_portfolio_model'),
+        "TD3 Agent": (TD3, 'checkpoints/td3_portfolio_model')
+    }
+    # Load test data
+    test_df = pd.read_csv(test_data_path, index_col='Date', parse_dates=True)
+    # Dictionary to store results
+    portfolio_values = {}
+    metrics = {}
+    # --- Run Evaluations for each RL Agent---
+    for name, (agent_type, model_path) in models_to_evaluate.items():
+        print(f"--- Evaluating {name} ---")
+        model = agent_type.load(model_path)
+        env = PortfolioEnv(test_df)
+        portfolio_values[name] = evaluate_agent(env, model)
+        metrics[name] = calculate_metrics(portfolio_values[name])
+    # --- Evaluate Buy and Hold Baseline ---
+    print("\n--- Evaluating Buy and Hold Baseline ---")
+    bnh_values = buy_and_hold(test_df)
+    portfolio_values["Buy and Hold"] = bnh_values
+    metrics["Buy and Hold"] = calculate_metrics(bnh_values)
+    # --- Combine and Print Metrics ---
+    print("\n--- Performance Metrics ---")
+    metrics_df = pd.DataFrame(metrics)
+    print(metrics_df)
+    # --- Plotting All Strategies ---
+    plt.style.use('seaborn-v0_8-darkgrid')
+    fig, ax = plt.subplots(figsize=(14, 8))
+    # Define colors for clarity
+    colors = {
+        "PPO Agent": "red",
+        "SAC Agent": "green",
+        "TD3 Agent": "orange",
+        "Buy and Hold": "blue"
+    }
+    for name, values in portfolio_values.items():
+        ax.plot(values.index, values, label=name, color=colors[name], linewidth=2)
+    ax.set_title('Agent Performance Comparison', fontsize=16)
+    ax.set_xlabel('Date', fontsize=12)
+    ax.set_ylabel('Portfolio Value ($)', fontsize=12)
+    ax.legend(fontsize=12)
+    formatter = FuncFormatter(lambda x, p: f'${x:,.0f}')
+    ax.yaxis.set_major_formatter(formatter)
+    plt.tight_layout()
+    plt.savefig('results/final_performance_comparison_all_agents.png')
+    plt.show()
+# Example of how to run this main function
+if __name__ == '__main__':
+    # You can specify a different test file here if needed
+    # e.g., main(test_data_path='data/stress_test_2018.csv')
+    main()

scripts/evaluate_baselines.py ADDED Viewed

	@@ -0,0 +1,134 @@

+# evaluate_baselines.py
+import pandas as pd
+import numpy as np
+import matplotlib.pyplot as plt
+def buy_and_hold(df, initial_balance=10000):
+    """
+    Simulates the Buy and Hold strategy.
+    Args:
+        df (pd.DataFrame): DataFrame with daily asset prices.
+        initial_balance (int): The starting capital.
+    Returns:
+        pd.Series: A Series containing the portfolio value for each day.
+    """
+    print("--- Simulating Buy and Hold ---")
+    n_assets = len(df.columns)
+    # Invest an equal amount in each asset at the beginning
+    initial_investment_per_asset = initial_balance / n_assets
+    # Get the initial prices
+    initial_prices = df.iloc[0]
+    # Calculate the number of shares bought for each asset
+    shares = initial_investment_per_asset / initial_prices
+    # Calculate the portfolio value for each day
+    portfolio_values = df.dot(shares)
+    print(f"Initial Investment: ${initial_balance:.2f}")
+    print(f"Final Portfolio Value: ${portfolio_values.iloc[-1]:.2f}")
+    return portfolio_values
+def equally_weighted_rebalanced(df, initial_balance=10000, rebalance_freq='M', transaction_cost_pct=0.001):
+    """
+    Simulates an Equally Weighted Portfolio with periodic rebalancing.
+    Args:
+        df (pd.DataFrame): DataFrame with daily asset prices.
+        initial_balance (int): The starting capital.
+        rebalance_freq (str): The rebalancing frequency ('M' for monthly, 'Q' for quarterly).
+        transaction_cost_pct (float): The transaction cost as a percentage.
+    Returns:
+        pd.Series: A Series containing the portfolio value for each day.
+    """
+    print(f"\n--- Simulating Equally Weighted Portfolio (Rebalanced {rebalance_freq}) ---")
+    n_assets = len(df.columns)
+    # Set the initial weights to be equal
+    weights = np.full(n_assets, 1/n_assets)
+    portfolio_value = initial_balance
+    portfolio_values = pd.Series(index=df.index)
+    last_rebalance_date = None
+    for date, prices in df.iterrows():
+        # Store the portfolio value for the day before any changes
+        portfolio_values[date] = portfolio_value
+        # Determine if it's a rebalancing day
+        # Rebalance on the first day of the new period (month, quarter)
+        if last_rebalance_date is None or (date.month != last_rebalance_date.month and rebalance_freq == 'M'):
+            # Calculate the value of trades to rebalance
+            target_asset_values = portfolio_value * (1/n_assets)
+            current_asset_values = weights * portfolio_value
+            trades = target_asset_values - current_asset_values
+            # Apply transaction costs
+            transaction_costs = np.sum(np.abs(trades)) * transaction_cost_pct
+            portfolio_value -= transaction_costs
+            # Reset weights to be equal
+            weights = np.full(n_assets, 1/n_assets)
+            last_rebalance_date = date
+        # Calculate portfolio value for the *next* day before the market opens
+        # Get price changes from today to the next trading day
+        today_prices = df.loc[date]
+        next_day_index = df.index.get_loc(date) + 1
+        if next_day_index < len(df):
+            next_day_prices = df.iloc[next_day_index]
+            price_change_ratio = next_day_prices / today_prices
+            # Update portfolio value based on price changes
+            portfolio_value = np.sum( (weights * portfolio_value) * price_change_ratio )
+            # Update weights due to market drift
+            new_asset_values = (weights * portfolio_value) * price_change_ratio
+            weights = new_asset_values / np.sum(new_asset_values)
+    print(f"Initial Investment: ${initial_balance:.2f}")
+    print(f"Final Portfolio Value: ${portfolio_values.iloc[-1]:.2f}")
+    return portfolio_values.dropna()
+def main():
+    # Load the test data
+    test_df = pd.read_csv('data/test.csv', index_col='Date', parse_dates=True)
+    # --- Run Baseline Strategies ---
+    bnh_values = buy_and_hold(test_df)
+    ewp_values = equally_weighted_rebalanced(test_df)
+    # --- Plot the results ---
+    plt.style.use('seaborn-v0_8-darkgrid')
+    fig, ax = plt.subplots(figsize=(14, 8))
+    ax.plot(bnh_values.index, bnh_values, label='Buy and Hold', color='blue', linewidth=2)
+    ax.plot(ewp_values.index, ewp_values, label='Equally Weighted (Rebalanced Monthly)', color='green', linewidth=2)
+    ax.set_title('Baseline Strategy Performance (2021-2023)', fontsize=16)
+    ax.set_xlabel('Date', fontsize=12)
+    ax.set_ylabel('Portfolio Value ($)', fontsize=12)
+    ax.legend(fontsize=12)
+    # Format the y-axis to show currency
+    from matplotlib.ticker import FuncFormatter
+    formatter = FuncFormatter(lambda x, p: f'${x:,.0f}')
+    ax.yaxis.set_major_formatter(formatter)
+    plt.tight_layout()
+    plt.savefig('baseline_performance.png')
+    plt.show()
+if __name__ == '__main__':
+    main()

scripts/fetch_data.py ADDED Viewed

	@@ -0,0 +1,75 @@

+import yfinance as yf
+import pandas as pd
+import os
+# --- Configuration ---
+# Asset tickers
+TICKERS = ["AAPL", "MSFT", "SPY", "TLT", "BTC-USD"]
+# Time periods for training and testing
+TRAIN_START_DATE = "2015-01-01"
+TRAIN_END_DATE = "2020-12-31"
+TEST_START_DATE = "2021-01-01"
+TEST_END_DATE = "2023-12-31"
+# Directory to save the data
+DATA_DIR = "data"
+TRAIN_DATA_PATH = os.path.join(DATA_DIR, "train.csv")
+TEST_DATA_PATH = os.path.join(DATA_DIR, "test.csv")
+# --- Data Fetching and Processing ---
+def fetch_and_prepare_data(start_date, end_date, tickers):
+    """
+    Fetches historical data for the given tickers and processes it.
+    Returns a DataFrame with 'Close' prices for each ticker.
+    """
+    print(f"Fetching data from {start_date} to {end_date} for {tickers}...")
+    data = yf.download(tickers, start=start_date, end=end_date)
+    # CHANGE: Add .copy() to explicitly create a new DataFrame and avoid warnings.
+    close_data = data['Close'].copy()
+    print("\nData Head:")
+    print(close_data.head())
+    print("\nMissing values before cleaning:")
+    print(close_data.isnull().sum())
+    # Now, all inplace operations are safely performed on our own copy.
+    close_data.ffill(inplace=True)
+    close_data.bfill(inplace=True)
+    print("\nMissing values after cleaning:")
+    print(close_data.isnull().sum())
+    for col in close_data.columns:
+        close_data[col] = pd.to_numeric(close_data[col], errors='coerce')
+    close_data.dropna(inplace=True)
+    return close_data
+def main():
+    """Main function to run the data fetching process."""
+    # Create data directory if it doesn't exist
+    if not os.path.exists(DATA_DIR):
+        os.makedirs(DATA_DIR)
+        print(f"Created directory: {DATA_DIR}")
+    # Fetch, process, and save training data
+    print("--- Preparing Training Data ---")
+    train_data = fetch_and_prepare_data(TRAIN_START_DATE, TRAIN_END_DATE, TICKERS)
+    train_data.to_csv(TRAIN_DATA_PATH)
+    print(f"Training data saved to {TRAIN_DATA_PATH}")
+    print("\n" + "="*50 + "\n")
+    # Fetch, process, and save testing data
+    print("--- Preparing Testing Data ---")
+    test_data = fetch_and_prepare_data(TEST_START_DATE, TEST_END_DATE, TICKERS)
+    test_data.to_csv(TEST_DATA_PATH)
+    print(f"Testing data saved to {TEST_DATA_PATH}")
+if __name__ == "__main__":
+    main()

scripts/fetch_market_data.py ADDED Viewed

	@@ -0,0 +1,78 @@

+import argparse
+import os
+import pandas as pd
+import yfinance as yf
+from datetime import date
+def fetch_data(start_date, end_date, output_filename):
+    """
+    Fetches, cleans, and saves historical market data for a given date range.
+    Args:
+        start_date (str): The start date for the data in 'YYYY-MM-DD' format.
+        end_date (str): The end date for the data in 'YYYY-MM-DD' format.
+        output_filename (str): The path and name of the file to save the data.
+    """
+    print(f"--- Fetching data from {start_date} to {end_date} ---")
+    # Define the base list of tickers
+    tickers = ["AAPL", "MSFT", "SPY", "TLT", "BTC-USD"]
+    # Smartly remove Bitcoin if the period is before its existence (e.g., before 2013)
+    if pd.to_datetime(start_date).year < 2013:
+        print("Note: Bitcoin (BTC-USD) did not exist for the requested period and will be excluded.")
+        tickers.remove("BTC-USD")
+    # Download data from Yahoo Finance
+    data = yf.download(tickers, start=start_date, end=end_date)
+    close_data = data['Close'].copy()
+    # Data Cleaning
+    print(f"\nMissing values before cleaning:\n{close_data.isnull().sum()}")
+    close_data.ffill(inplace=True)
+    close_data.bfill(inplace=True)
+    # Drop any columns that are still all NaN (like BTC in the 2008 data)
+    close_data.dropna(axis=1, how='all', inplace=True)
+    print(f"\nMissing values after cleaning:\n{close_data.isnull().sum()}")
+    # Ensure data directory exists
+    output_dir = os.path.dirname(output_filename)
+    if output_dir and not os.path.exists(output_dir):
+        os.makedirs(output_dir)
+    # Save to CSV
+    close_data.to_csv(output_filename)
+    print(f"\n✅ Data successfully saved to {output_filename}")
+if __name__ == "__main__":
+    # Set up command-line argument parsing
+    parser = argparse.ArgumentParser(description="Fetch historical market data for specified periods.")
+    parser.add_argument(
+        "--start",
+        type=str,
+        default="2018-01-01",
+        help="Start date in YYYY-MM-DD format. Default is for the 2018 stress test."
+    )
+    parser.add_argument(
+        "--end",
+        type=str,
+        default="2019-12-31",
+        help="End date in YYYY-MM-DD format. Default is for the 2018 stress test."
+    )
+    parser.add_argument(
+        "--filename",
+        type=str,
+        default="data/stress_test_2018.csv",
+        help="Output file name (e.g., 'data/my_data.csv')."
+    )
+    args = parser.parse_args()
+    # Use 'today' as the end date if specified
+    end_date = date.today().strftime('%Y-%m-%d') if args.end.lower() == 'today' else args.end
+    fetch_data(start_date=args.start, end_date=end_date, output_filename=args.filename)

scripts/stress_test.py ADDED Viewed

	@@ -0,0 +1,142 @@

+import argparse
+import os
+import pandas as pd
+import numpy as np
+import matplotlib.pyplot as plt
+from matplotlib.ticker import FuncFormatter
+# Import all agent classes and the environment
+from stable_baselines3 import PPO, SAC, TD3
+from src.environment import PortfolioEnv
+# --- Helper Functions ---
+def evaluate_agent(env, model):
+    """Runs a trained agent on a given environment."""
+    obs, info = env.reset()
+    terminated, truncated = False, False
+    portfolio_values = [env.initial_balance]
+    while not (terminated or truncated):
+        action, _states = model.predict(obs, deterministic=True)
+        obs, reward, terminated, truncated, info = env.step(action)
+        portfolio_values.append(info['portfolio_value'])
+    return pd.Series(portfolio_values, index=env.df.index[:len(portfolio_values)])
+def buy_and_hold(df, initial_balance=10000):
+    """Simulates the Buy and Hold strategy."""
+    n_assets = len(df.columns)
+    initial_investment_per_asset = initial_balance / n_assets
+    initial_prices = df.iloc[0]
+    shares = initial_investment_per_asset / initial_prices
+    portfolio_values = df.dot(shares)
+    return portfolio_values
+def calculate_metrics(portfolio_values):
+    """Calculates performance metrics from a portfolio value series."""
+    total_return = (portfolio_values.iloc[-1] / portfolio_values.iloc[0]) - 1
+    num_years = (portfolio_values.index[-1] - portfolio_values.index[0]).days / 365.25
+    cagr = (portfolio_values.iloc[-1] / portfolio_values.iloc[0]) ** (1/num_years) - 1 if num_years > 0 else 0
+    daily_returns = portfolio_values.pct_change().dropna()
+    sharpe_ratio = np.sqrt(252) * (daily_returns.mean() / daily_returns.std()) if daily_returns.std() != 0 else 0
+    rolling_max = portfolio_values.cummax()
+    daily_drawdown = portfolio_values / rolling_max - 1.0
+    max_drawdown = daily_drawdown.min()
+    return {
+        "Total Return": f"{total_return:.2%}", "CAGR": f"{cagr:.2%}",
+        "Sharpe Ratio": f"{sharpe_ratio:.2f}", "Max Drawdown": f"{max_drawdown:.2%}"
+    }
+# --- Main Stress Test Function ---
+def run_stress_test(datafile_path, ppo_path, sac_path, td3_path, output_path):
+    """
+    Loads data and models, runs evaluations, and plots the comparison.
+    """
+    print(f"--- Running Stress Test on {datafile_path} ---")
+    # 1. Load Data
+    try:
+        test_df = pd.read_csv(datafile_path, index_col='Date', parse_dates=True)
+    except FileNotFoundError:
+        print(f"❌ Error: Data file not found at {datafile_path}")
+        return
+    # Check for asset mismatch (e.g., 4 assets in 2008 data vs 5-asset models)
+    # The standard models were trained on 5 assets (e.g., shape = 30 * 5 = 150)
+    expected_assets = 5
+    if test_df.shape[1] != expected_assets:
+        print(f"⚠️ Warning: Models were trained on {expected_assets} assets, but this dataset has {test_df.shape[1]}.")
+        print("Skipping agent evaluation for this dataset.")
+        return
+    # 2. Define Models to Evaluate
+    models_to_evaluate = {
+        "PPO Agent": (PPO, ppo_path),
+        "SAC Agent": (SAC, sac_path),
+        "TD3 Agent": (TD3, td3_path)
+    }
+    portfolio_values = {}
+    metrics = {}
+    # 3. Run Evaluations
+    for name, (agent_type, model_path) in models_to_evaluate.items():
+        if os.path.exists(model_path):
+            print(f"--- Evaluating {name} ---")
+            model = agent_type.load(model_path)
+            env = PortfolioEnv(test_df)
+            portfolio_values[name] = evaluate_agent(env, model)
+            metrics[name] = calculate_metrics(portfolio_values[name])
+        else:
+            print(f"⚠️ Warning: Model file not found at {model_path}. Skipping.")
+    # Evaluate Buy and Hold Baseline
+    print("\n--- Evaluating Buy and Hold Baseline ---")
+    bnh_values = buy_and_hold(test_df)
+    portfolio_values["Buy and Hold"] = bnh_values
+    metrics["Buy and Hold"] = calculate_metrics(bnh_values)
+    # 4. Display Results
+    print("\n--- Stress Test Performance Metrics ---")
+    metrics_df = pd.DataFrame(metrics)
+    print(metrics_df)
+    # 5. Plotting
+    plt.style.use('seaborn-v0_8-darkgrid')
+    fig, ax = plt.subplots(figsize=(14, 8))
+    colors = {"PPO Agent": "red", "SAC Agent": "green", "TD3 Agent": "orange", "Buy and Hold": "blue"}
+    for name, values in portfolio_values.items():
+        ax.plot(values.index, values, label=name, color=colors.get(name, 'black'), linewidth=2)
+    plot_title = f"Agent Stress Test: {os.path.basename(datafile_path).replace('.csv', '')}"
+    ax.set_title(plot_title, fontsize=16)
+    ax.set_xlabel('Date', fontsize=12)
+    ax.set_ylabel('Portfolio Value ($)', fontsize=12)
+    ax.legend(fontsize=12)
+    formatter = FuncFormatter(lambda x, p: f'${x:,.0f}')
+    ax.yaxis.set_major_formatter(formatter)
+    plt.tight_layout()
+    plt.savefig(output_path)
+    print(f"\n✅ Plot saved to {output_path}")
+    plt.show()
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser(description="Run a stress test on trained RL portfolio agents.")
+    parser.add_argument("--datafile", type=str, default="data/stress_test_2018.csv", help="Path to the market data CSV file for the test.")
+    parser.add_argument("--ppopath", type=str, default="checkpoints/ppo_portfolio_model.zip", help="Path to the trained PPO model.")
+    parser.add_argument("--sacpath", type=str, default="checkpoints/sac_portfolio_model.zip", help="Path to the trained SAC model.")
+    parser.add_argument("--td3path", type=str, default="checkpoints/td3_portfolio_model.zip", help="Path to the trained TD3 model.")
+    parser.add_argument("--output", type=str, default="results/stress_test_comparison.png", help="Path to save the output plot.")
+    args = parser.parse_args()
+    run_stress_test(
+        datafile_path=args.datafile,
+        ppo_path=args.ppopath,
+        sac_path=args.sacpath,
+        td3_path=args.td3path,
+        output_path=args.output
+    )

scripts/train.py ADDED Viewed

	@@ -0,0 +1,77 @@

+import argparse
+import pandas as pd
+from stable_baselines3 import PPO, SAC, TD3
+from environment import PortfolioEnv
+def train_agent(agent_name="td3", timesteps=100000):
+    """
+    Main function to train a specified RL agent.
+    Args:
+        agent_name (str): The RL algorithm to use ('ppo', 'sac', or 'td3').
+        timesteps (int): The total number of timesteps for training.
+    """
+    # 1. Map agent names to their corresponding classes
+    AGENT_CLASSES = {
+        "ppo": PPO,
+        "sac": SAC,
+        "td3": TD3
+    }
+    agent_class = AGENT_CLASSES.get(agent_name.lower())
+    if agent_class is None:
+        raise ValueError(f"Unknown agent: {agent_name}. Choose from {list(AGENT_CLASSES.keys())}")
+    model_name = agent_name.lower()
+    # 2. Load data and create the environment
+    print("--- Loading Data and Creating Environment ---")
+    try:
+        df = pd.read_csv('data/train.csv', index_col='Date', parse_dates=True)
+        env = PortfolioEnv(df)
+        print("Environment created successfully.")
+    except FileNotFoundError:
+        print("❌ Error: 'data/train.csv' not found. Make sure to run a data fetching script first.")
+        return
+    # 3. Create the RL Agent
+    print(f"--- Creating {agent_name.upper()} Agent ---")
+    model = agent_class(
+        "MlpPolicy",
+        env,
+        verbose=1,
+        tensorboard_log="./tensorboard_logs/"
+    )
+    # 4. Train the Agent
+    print(f"--- Starting Agent Training for {timesteps} timesteps ---")
+    model.learn(total_timesteps=timesteps)
+    print("--- Agent Training Complete ---")
+    # 5. Save the Trained Model
+    save_path = f"checkpoints/{model_name}_portfolio_model"
+    model.save(save_path)
+    print(f"✅ Model saved to checkpoints/{save_path}.zip")
+if __name__ == "__main__":
+    # 6. Set up command-line argument parsing
+    parser = argparse.ArgumentParser(description="Train a Reinforcement Learning agent for portfolio management.")
+    parser.add_argument(
+        "--agent",
+        type=str,
+        default="td3",
+        choices=["ppo", "sac", "td3"],
+        help="The RL algorithm to use for training (default: td3)."
+    )
+    parser.add_argument(
+        "--timesteps",
+        type=int,
+        default=100000,
+        help="The total number of timesteps for training (default: 100000)."
+    )
+    args = parser.parse_args()
+    # Call the main training function with the parsed arguments
+    train_agent(agent_name=args.agent, timesteps=args.timesteps)

scripts/visualize_strategy.py ADDED Viewed

	@@ -0,0 +1,123 @@

+import argparse
+import os
+import pandas as pd
+import numpy as np
+import matplotlib.pyplot as plt
+from matplotlib.ticker import FuncFormatter
+from stable_baselines3 import PPO, SAC, TD3
+from environment import PortfolioEnv
+def visualize_strategy(agent_name, checkpoint_path, datafile_path, output_path):
+    """
+    Loads a trained agent, runs a simulation, and plots its portfolio allocation strategy.
+    Args:
+        agent_name (str): The type of agent to load ('ppo', 'sac', 'td3').
+        checkpoint_path (str): The path to the saved model checkpoint file (.zip).
+        datafile_path (str): The path to the CSV market data for the simulation.
+        output_path (str): The path to save the output plot image.
+    """
+    print(f"--- Visualizing strategy for {agent_name.upper()} agent ---")
+    # 1. Define a mapping from agent names to their classes
+    AGENT_CLASSES = {
+        "ppo": PPO,
+        "sac": SAC,
+        "td3": TD3
+    }
+    agent_class = AGENT_CLASSES[agent_name.lower()]
+    # 2. Load Data and Model
+    try:
+        test_df = pd.read_csv(datafile_path, index_col='Date', parse_dates=True)
+        model = agent_class.load(checkpoint_path)
+    except FileNotFoundError as e:
+        print(f"❌ Error: Could not find a required file. {e}")
+        return
+    except Exception as e:
+        print(f"❌ An error occurred: {e}")
+        return
+    # 3. Create Environment and Run Simulation
+    env = PortfolioEnv(test_df)
+    obs, info = env.reset()
+    terminated, truncated = False, False
+    weights_history = [info['weights']]
+    while not (terminated or truncated):
+        action, _states = model.predict(obs, deterministic=True)
+        obs, reward, terminated, truncated, info = env.step(action)
+        weights_history.append(info['weights'])
+    print("✅ Simulation complete.")
+    # 4. Prepare Data for Plotting
+    weights_df = pd.DataFrame(weights_history)
+    asset_names = test_df.columns.tolist() + ['Cash']
+    weights_df.columns = asset_names
+    weights_df.index = test_df.index[:len(weights_df)]
+    # 5. Plotting the Stacked Area Chart
+    print("📊 Generating plot...")
+    plt.style.use('seaborn-v0_8-darkgrid')
+    fig, ax = plt.subplots(figsize=(15, 8))
+    ax.stackplot(weights_df.index, weights_df.T, labels=weights_df.columns, alpha=0.8)
+    ax.set_title(f'Agent Portfolio Allocation Over Time ({agent_name.upper()})', fontsize=16)
+    ax.set_xlabel('Date', fontsize=12)
+    ax.set_ylabel('Portfolio Allocation (%)', fontsize=12)
+    ax.legend(loc='upper left', fontsize=10)
+    formatter = FuncFormatter(lambda y, p: f'{y:.0%}')
+    ax.yaxis.set_major_formatter(formatter)
+    plt.tight_layout()
+    # Ensure output directory exists
+    output_dir = os.path.dirname(output_path)
+    if output_dir and not os.path.exists(output_dir):
+        os.makedirs(output_dir)
+    plt.savefig(output_path)
+    print(f"✅ Plot saved to {output_path}")
+    plt.show()
+if __name__ == "__main__":
+    # Set up command-line argument parsing
+    parser = argparse.ArgumentParser(description="Visualize a trained RL agent's portfolio allocation strategy.")
+    parser.add_argument(
+        "--agent",
+        type=str,
+        required=True,
+        choices=["ppo", "sac", "td3"],
+        help="The RL algorithm of the trained agent."
+    )
+    parser.add_argument(
+        "--checkpoint",
+        type=str,
+        required=True,
+        help="Path to the saved model checkpoint .zip file (e.g., 'td3_portfolio_model.zip')."
+    )
+    parser.add_argument(
+        "--datafile",
+        type=str,
+        default="data/test.csv",
+        help="Path to the market data CSV file to run the simulation on."
+    )
+    parser.add_argument(
+        "--output",
+        type=str,
+        default="results/agent_allocation.png",
+        help="Path to save the output plot image."
+    )
+    args = parser.parse_args()
+    visualize_strategy(
+        agent_name=args.agent,
+        checkpoint_path=args.checkpoint,
+        datafile_path=args.datafile,
+        output_path=args.output
+    )