|
|
--- |
|
|
license: mit |
|
|
language: en |
|
|
library_name: stable-baselines3 |
|
|
repo_url: https://github.com/JonusNattapong/Reinforcement-Learning-for-Gold-Trading |
|
|
tags: |
|
|
- reinforcement-learning |
|
|
- finance |
|
|
- gold-trading |
|
|
- xauusd |
|
|
- ppo |
|
|
metrics: |
|
|
- sharpe_ratio |
|
|
- win_rate |
|
|
pipeline_tag: reinforcement-learning |
|
|
datasets: |
|
|
- ZombitX64/xauusd-gold-price-historical-data-2004-2025 |
|
|
--- |
|
|
|
|
|
# PPO Model for XAUUSD Gold Trading |
|
|
|
|
|
This repository contains a Reinforcement Learning model trained using Proximal Policy Optimization (PPO) for trading XAUUSD (Gold vs US Dollar) on 15-minute timeframes. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Model Type**: PPO (Proximal Policy Optimization) |
|
|
- **Framework**: Stable-Baselines3 |
|
|
- **Environment**: Custom Gym environment for XAUUSD trading |
|
|
- **Training Data**: Historical XAUUSD data from 2004 to 2025 (resampled to 15-min bars) |
|
|
- **Total Timesteps**: 1,000,000 |
|
|
- **Position Sizing**: Base 5.0 oz, Max 7.5 oz |
|
|
- **Initial Capital**: 200 USD |
|
|
- **Transaction Cost**: 0.65 USD per oz |
|
|
|
|
|
## Performance Metrics (Test Set) |
|
|
|
|
|
- **Average Daily Profit**: 51.46 USD |
|
|
- **Win Rate**: 69.0% |
|
|
- **Max Drawdown**: 12.0% |
|
|
- **Sharpe Ratio**: 7.56 |
|
|
- **Average Trades per Day**: 2.66 |
|
|
|
|
|
## Features Used |
|
|
|
|
|
- Log Return |
|
|
- RSI (14-period) |
|
|
- Moving Averages (short/long) |
|
|
- Bollinger Bands |
|
|
- MACD |
|
|
- Volume indicators |
|
|
|
|
|
## Source Code |
|
|
- GitHub: https://github.com/JonusNattapong/Reinforcement-Learning-for-Gold-Trading |
|
|
|
|
|
## Usage |
|
|
|
|
|
### Loading the Model |
|
|
|
|
|
Below are two safe ways to load the trained policy depending on what you have available. |
|
|
|
|
|
Option A — Load the full Stable-Baselines3 model (.zip) |
|
|
|
|
|
```python |
|
|
from stable_baselines3 import PPO |
|
|
from stable_baselines3.common.vec_env import VecNormalize |
|
|
import os |
|
|
|
|
|
# Create or reconstruct an environment similar to the one used for training |
|
|
# e.g. `env = make_your_env(...)` — replace with your env factory |
|
|
env = ... |
|
|
|
|
|
# If you saved VecNormalize separately, load and wrap your env first |
|
|
if os.path.exists("models/vecnormalize.pkl"): |
|
|
vec = VecNormalize.load("models/vecnormalize.pkl", env) |
|
|
vec.training = False |
|
|
vec.norm_reward = False |
|
|
env = vec |
|
|
|
|
|
# Load the full model (policy + optimizer state) |
|
|
model = PPO.load("models/ppo_xauusd.zip", env=env) |
|
|
``` |
|
|
|
|
|
Option B — Load weights saved as SafeTensors into a fresh PPO policy |
|
|
|
|
|
```python |
|
|
from safetensors.torch import load_file |
|
|
import torch |
|
|
from stable_baselines3 import PPO |
|
|
from stable_baselines3.common.vec_env import VecNormalize |
|
|
import os |
|
|
|
|
|
# Create or reconstruct the same environment used for training |
|
|
env = ... |
|
|
|
|
|
# If you have VecNormalize statistics, load them and wrap the env |
|
|
if os.path.exists("models/vecnormalize.pkl"): |
|
|
vec = VecNormalize.load("models/vecnormalize.pkl", env) |
|
|
vec.training = False |
|
|
vec.norm_reward = False |
|
|
env = vec |
|
|
|
|
|
# Instantiate a PPO model with the same policy architecture |
|
|
model = PPO("MlpPolicy", env) |
|
|
|
|
|
# Load SafeTensors state dict and convert values to torch.Tensor if needed |
|
|
raw_state = load_file("models/ppo_xauusd.safetensors") |
|
|
state_dict = {k: (torch.tensor(v) if not isinstance(v, torch.Tensor) else v) for k, v in raw_state.items()} |
|
|
|
|
|
# Load weights into the policy |
|
|
model.policy.load_state_dict(state_dict) |
|
|
|
|
|
# Ensure the model has the same env wrapper |
|
|
model.set_env(env) |
|
|
``` |
|
|
|
|
|
Notes: |
|
|
- Option A is preferred when `ppo_xauusd.zip` is available (it contains the entire SB3 model). |
|
|
- Option B is useful when only the policy weights were exported as SafeTensors. Ensure the policy architecture and observation/action spaces match the original training setup. |
|
|
- Always set `vec.training = False` and `vec.norm_reward = False` when running inference. |
|
|
|
|
|
|
|
|
### For Full Inference |
|
|
|
|
|
To use the model for trading, you'll need to: |
|
|
1. Set up the trading environment (`XAUUSDTradingEnv`) |
|
|
2. Load VecNormalize stats |
|
|
3. Run predictions |
|
|
|
|
|
Note: This is a simulation model. Use with caution in real trading. |
|
|
|
|
|
## Training Configuration |
|
|
|
|
|
- Learning Rate: 0.0003 |
|
|
- Batch Size: 256 |
|
|
- Gamma: 0.99 |
|
|
- GAE Lambda: 0.95 |
|
|
- Clip Range: 0.2 |
|
|
- Entropy Coefficient: 0.01 |
|
|
|
|
|
## Files |
|
|
|
|
|
- `ppo_xauusd.safetensors`: Model weights in SafeTensors format |
|
|
- `vecnormalize.pkl`: VecNormalize statistics for observation normalization |
|
|
|
|
|
## License |
|
|
|
|
|
MIT License |
|
|
|
|
|
## Disclaimer |
|
|
|
|
|
This model is for educational and research purposes only. Trading involves risk, and past performance does not guarantee future results. Always backtest and validate before using in live trading. |