--- license: mit language: en library_name: stable-baselines3 repo_url: https://github.com/JonusNattapong/Reinforcement-Learning-for-Gold-Trading tags: - reinforcement-learning - finance - gold-trading - xauusd - ppo metrics: - sharpe_ratio - win_rate pipeline_tag: reinforcement-learning datasets: - ZombitX64/xauusd-gold-price-historical-data-2004-2025 --- # PPO Model for XAUUSD Gold Trading This repository contains a Reinforcement Learning model trained using Proximal Policy Optimization (PPO) for trading XAUUSD (Gold vs US Dollar) on 15-minute timeframes. ## Model Details - **Model Type**: PPO (Proximal Policy Optimization) - **Framework**: Stable-Baselines3 - **Environment**: Custom Gym environment for XAUUSD trading - **Training Data**: Historical XAUUSD data from 2004 to 2025 (resampled to 15-min bars) - **Total Timesteps**: 1,000,000 - **Position Sizing**: Base 5.0 oz, Max 7.5 oz - **Initial Capital**: 200 USD - **Transaction Cost**: 0.65 USD per oz ## Performance Metrics (Test Set) - **Average Daily Profit**: 51.46 USD - **Win Rate**: 69.0% - **Max Drawdown**: 12.0% - **Sharpe Ratio**: 7.56 - **Average Trades per Day**: 2.66 ## Features Used - Log Return - RSI (14-period) - Moving Averages (short/long) - Bollinger Bands - MACD - Volume indicators ## Source Code - GitHub: https://github.com/JonusNattapong/Reinforcement-Learning-for-Gold-Trading ## Usage ### Loading the Model Below are two safe ways to load the trained policy depending on what you have available. Option A — Load the full Stable-Baselines3 model (.zip) ```python from stable_baselines3 import PPO from stable_baselines3.common.vec_env import VecNormalize import os # Create or reconstruct an environment similar to the one used for training # e.g. `env = make_your_env(...)` — replace with your env factory env = ... # If you saved VecNormalize separately, load and wrap your env first if os.path.exists("models/vecnormalize.pkl"): vec = VecNormalize.load("models/vecnormalize.pkl", env) vec.training = False vec.norm_reward = False env = vec # Load the full model (policy + optimizer state) model = PPO.load("models/ppo_xauusd.zip", env=env) ``` Option B — Load weights saved as SafeTensors into a fresh PPO policy ```python from safetensors.torch import load_file import torch from stable_baselines3 import PPO from stable_baselines3.common.vec_env import VecNormalize import os # Create or reconstruct the same environment used for training env = ... # If you have VecNormalize statistics, load them and wrap the env if os.path.exists("models/vecnormalize.pkl"): vec = VecNormalize.load("models/vecnormalize.pkl", env) vec.training = False vec.norm_reward = False env = vec # Instantiate a PPO model with the same policy architecture model = PPO("MlpPolicy", env) # Load SafeTensors state dict and convert values to torch.Tensor if needed raw_state = load_file("models/ppo_xauusd.safetensors") state_dict = {k: (torch.tensor(v) if not isinstance(v, torch.Tensor) else v) for k, v in raw_state.items()} # Load weights into the policy model.policy.load_state_dict(state_dict) # Ensure the model has the same env wrapper model.set_env(env) ``` Notes: - Option A is preferred when `ppo_xauusd.zip` is available (it contains the entire SB3 model). - Option B is useful when only the policy weights were exported as SafeTensors. Ensure the policy architecture and observation/action spaces match the original training setup. - Always set `vec.training = False` and `vec.norm_reward = False` when running inference. ### For Full Inference To use the model for trading, you'll need to: 1. Set up the trading environment (`XAUUSDTradingEnv`) 2. Load VecNormalize stats 3. Run predictions Note: This is a simulation model. Use with caution in real trading. ## Training Configuration - Learning Rate: 0.0003 - Batch Size: 256 - Gamma: 0.99 - GAE Lambda: 0.95 - Clip Range: 0.2 - Entropy Coefficient: 0.01 ## Files - `ppo_xauusd.safetensors`: Model weights in SafeTensors format - `vecnormalize.pkl`: VecNormalize statistics for observation normalization ## License MIT License ## Disclaimer This model is for educational and research purposes only. Trading involves risk, and past performance does not guarantee future results. Always backtest and validate before using in live trading.