--- library_name: stable-baselines3 tags: - reinforcement-learning - finance - stock-trading - deep-reinforcement-learning - dqn - ppo - a2c model-index: - name: RL-Trading-Agents results: - task: type: reinforcement-learning name: Stock Trading metrics: - type: sharpe_ratio value: Variable - type: total_return value: Variable --- # 🤖 Multi-Agent Reinforcement Learning Trading System This repository contains trained Deep Reinforcement Learning agents for automated stock trading. The agents were trained using `stable-baselines3` on a custom OpenAI Gym environment simulating the US Stock Market (AAPL, MSFT, GOOGL). ## 🧠 Models The following algorithms were used: 1. **DQN (Deep Q-Network)**: Off-policy RL algorithm suitable for discrete action spaces. 2. **PPO (Proximal Policy Optimization)**: On-policy gradient method known for stability. 3. **A2C (Advantage Actor-Critic)**: Synchronous deterministic policy gradient method. 4. **Ensemble**: A meta-voter that takes the majority decision from the above three. ## 🏋️ Training Data The models were trained on technical indicators derived from historical daily price data (2018-2024): * **Returns**: Daily percentage change. * **RSI (14)**: Relative Strength Index. * **MACD**: Moving Average Convergence Divergence. * **Bollinger Bands**: Volatility measure. * **Volume Ratio**: Relative volume intensity. * **Market Regime**: Bull/Bear trend classification. ## 🔗 Related Data * **Dataset Repository**: [AdityaaXD/Multi-Agent_Reinforcement_Learning_Trading_System_Data](https://huggingface.co/AdityaaXD/Multi-Agent_Reinforcement_Learning_Trading_System_Data) * **GitHub Repository**: [ADITYA-tp01/Multi-Agent-Reinforcement-Learning-Trading-System-Data](https://github.com/ADITYA-tp01/Multi-Agent-Reinforcement-Learning-Trading-System-Data) ## 🎮 Environment (`TradingEnv`) * **Action Space**: Discrete(3) - `0: HOLD`, `1: BUY`, `2: SELL`. * **Observation Space**: Box(10,) - Normalized technical features + portfolio state. * **Reward**: Profit & Loss (PnL) minus transaction costs and drawdown penalties. ## 🚀 Usage ```python import gymnasium as gym from stable_baselines3 import PPO # Load the environment (custom wrapper required) # env = TradingEnv(df) # Load model model = PPO.load("ppo_AAPL.zip") # Predict action, _ = model.predict(obs, deterministic=True) ``` ## 📈 Performance Performance varies by ticker and market condition. See the generated `results/` CSVs for detailed Sharpe Ratios and Max Drawdown stats per agent. ## 🛠️ Credits Developed by **Adityaraj Suman** as part of the Multi-Agent RL Trading System project.