---
library_name: stable-baselines3
tags:
- reinforcement-learning
- finance
- stock-trading
- deep-reinforcement-learning
- dqn
- ppo
- a2c
model-index:
- name: RL-Trading-Agents
  results:
  - task:
      type: reinforcement-learning
      name: Stock Trading
    metrics:
    - type: sharpe_ratio
      value: Variable
    - type: total_return
      value: Variable
---

# 🤖 Multi-Agent Reinforcement Learning Trading System

This repository contains trained Deep Reinforcement Learning agents for automated stock trading. The agents were trained using `stable-baselines3` on a custom OpenAI Gym environment simulating the US Stock Market (AAPL, MSFT, GOOGL).

## 🧠 Models

The following algorithms were used:
1.  **DQN (Deep Q-Network)**: Off-policy RL algorithm suitable for discrete action spaces.
2.  **PPO (Proximal Policy Optimization)**: On-policy gradient method known for stability.
3.  **A2C (Advantage Actor-Critic)**: Synchronous deterministic policy gradient method.
4.  **Ensemble**: A meta-voter that takes the majority decision from the above three.

## 🏋️ Training Data

The models were trained on technical indicators derived from historical daily price data (2018-2024):
*   **Returns**: Daily percentage change.
*   **RSI (14)**: Relative Strength Index.
*   **MACD**: Moving Average Convergence Divergence.
*   **Bollinger Bands**: Volatility measure.
*   **Volume Ratio**: Relative volume intensity.
*   **Market Regime**: Bull/Bear trend classification.

## 🔗 Related Data

*   **Dataset Repository**: [AdityaaXD/Multi-Agent_Reinforcement_Learning_Trading_System_Data](https://huggingface.co/AdityaaXD/Multi-Agent_Reinforcement_Learning_Trading_System_Data)
*   **GitHub Repository**: [ADITYA-tp01/Multi-Agent-Reinforcement-Learning-Trading-System-Data](https://github.com/ADITYA-tp01/Multi-Agent-Reinforcement-Learning-Trading-System-Data)


## 🎮 Environment (`TradingEnv`)

*   **Action Space**: Discrete(3) - `0: HOLD`, `1: BUY`, `2: SELL`.
*   **Observation Space**: Box(10,) - Normalized technical features + portfolio state.
*   **Reward**: Profit & Loss (PnL) minus transaction costs and drawdown penalties.

## 🚀 Usage

```python
import gymnasium as gym
from stable_baselines3 import PPO

# Load the environment (custom wrapper required)
# env = TradingEnv(df)

# Load model
model = PPO.load("ppo_AAPL.zip")

# Predict
action, _ = model.predict(obs, deterministic=True)
```

## 📈 Performance

Performance varies by ticker and market condition. See the generated `results/` CSVs for detailed Sharpe Ratios and Max Drawdown stats per agent.

## 🛠️ Credits

Developed by **Adityaraj Suman** as part of the Multi-Agent RL Trading System project.