AdityaaXD
/

Multi-Agent_Reinforcement_Learning_Trading_System_Models

Reinforcement Learning

stable-baselines3

deep-reinforcement-learning

Eval Results (legacy)

Model card Files Files and versions

AdityaaXD commited on Jan 29

Commit

d7b268e

·

verified ·

1 Parent(s): 46d764a

Update Model Card (README.md)

Files changed (1) hide show

README.md +74 -0

README.md ADDED Viewed

	@@ -0,0 +1,74 @@

+---
+library_name: stable-baselines3
+tags:
+- reinforcement-learning
+- finance
+- stock-trading
+- deep-reinforcement-learning
+- dqn
+- ppo
+- a2c
+model-index:
+- name: RL-Trading-Agents
+  results:
+  - task:
+      type: reinforcement-learning
+      name: Stock Trading
+    metrics:
+    - type: sharpe_ratio
+      value: Variable
+    - type: total_return
+      value: Variable
+---
+# 🤖 Multi-Agent Reinforcement Learning Trading System
+This repository contains trained Deep Reinforcement Learning agents for automated stock trading. The agents were trained using `stable-baselines3` on a custom OpenAI Gym environment simulating the US Stock Market (AAPL, MSFT, GOOGL).
+## 🧠 Models
+The following algorithms were used:
+1.  **DQN (Deep Q-Network)**: Off-policy RL algorithm suitable for discrete action spaces.
+2.  **PPO (Proximal Policy Optimization)**: On-policy gradient method known for stability.
+3.  **A2C (Advantage Actor-Critic)**: Synchronous deterministic policy gradient method.
+4.  **Ensemble**: A meta-voter that takes the majority decision from the above three.
+## 🏋️ Training Data
+The models were trained on technical indicators derived from historical daily price data (2018-2024):
+*   **Returns**: Daily percentage change.
+*   **RSI (14)**: Relative Strength Index.
+*   **MACD**: Moving Average Convergence Divergence.
+*   **Bollinger Bands**: Volatility measure.
+*   **Volume Ratio**: Relative volume intensity.
+*   **Market Regime**: Bull/Bear trend classification.
+## 🎮 Environment (`TradingEnv`)
+*   **Action Space**: Discrete(3) - `0: HOLD`, `1: BUY`, `2: SELL`.
+*   **Observation Space**: Box(10,) - Normalized technical features + portfolio state.
+*   **Reward**: Profit & Loss (PnL) minus transaction costs and drawdown penalties.
+## 🚀 Usage
+```python
+import gymnasium as gym
+from stable_baselines3 import PPO
+# Load the environment (custom wrapper required)
+# env = TradingEnv(df)
+# Load model
+model = PPO.load("ppo_AAPL.zip")
+# Predict
+action, _ = model.predict(obs, deterministic=True)
+```
+## 📈 Performance
+Performance varies by ticker and market condition. See the generated `results/` CSVs for detailed Sharpe Ratios and Max Drawdown stats per agent.
+## 🛠️ Credits
+Developed by **Adityaraj Suman** as part of the Multi-Agent RL Trading System project.