File size: 4,326 Bytes
529bf3b
 
 
 
51ac139
529bf3b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
51ac139
 
 
529bf3b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
610e83c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
---
license: mit
language: en
library_name: stable-baselines3
repo_url: https://github.com/JonusNattapong/Reinforcement-Learning-for-Gold-Trading
tags:
- reinforcement-learning
- finance
- gold-trading
- xauusd
- ppo
metrics:
- sharpe_ratio
- win_rate
pipeline_tag: reinforcement-learning
datasets:
- ZombitX64/xauusd-gold-price-historical-data-2004-2025
---

# PPO Model for XAUUSD Gold Trading

This repository contains a Reinforcement Learning model trained using Proximal Policy Optimization (PPO) for trading XAUUSD (Gold vs US Dollar) on 15-minute timeframes.

## Model Details

- **Model Type**: PPO (Proximal Policy Optimization)
- **Framework**: Stable-Baselines3
- **Environment**: Custom Gym environment for XAUUSD trading
- **Training Data**: Historical XAUUSD data from 2004 to 2025 (resampled to 15-min bars)
- **Total Timesteps**: 1,000,000
- **Position Sizing**: Base 5.0 oz, Max 7.5 oz
- **Initial Capital**: 200 USD
- **Transaction Cost**: 0.65 USD per oz

## Performance Metrics (Test Set)

- **Average Daily Profit**: 51.46 USD
- **Win Rate**: 69.0%
- **Max Drawdown**: 12.0%
- **Sharpe Ratio**: 7.56
- **Average Trades per Day**: 2.66

## Features Used

- Log Return
- RSI (14-period)
- Moving Averages (short/long)
- Bollinger Bands
- MACD
- Volume indicators

## Source Code
- GitHub: https://github.com/JonusNattapong/Reinforcement-Learning-for-Gold-Trading

## Usage

### Loading the Model

Below are two safe ways to load the trained policy depending on what you have available.

Option A — Load the full Stable-Baselines3 model (.zip)

```python
from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import VecNormalize
import os

# Create or reconstruct an environment similar to the one used for training
# e.g. `env = make_your_env(...)` — replace with your env factory
env = ...

# If you saved VecNormalize separately, load and wrap your env first
if os.path.exists("models/vecnormalize.pkl"):
	vec = VecNormalize.load("models/vecnormalize.pkl", env)
	vec.training = False
	vec.norm_reward = False
	env = vec

# Load the full model (policy + optimizer state)
model = PPO.load("models/ppo_xauusd.zip", env=env)
```

Option B — Load weights saved as SafeTensors into a fresh PPO policy

```python
from safetensors.torch import load_file
import torch
from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import VecNormalize
import os

# Create or reconstruct the same environment used for training
env = ...

# If you have VecNormalize statistics, load them and wrap the env
if os.path.exists("models/vecnormalize.pkl"):
	vec = VecNormalize.load("models/vecnormalize.pkl", env)
	vec.training = False
	vec.norm_reward = False
	env = vec

# Instantiate a PPO model with the same policy architecture
model = PPO("MlpPolicy", env)

# Load SafeTensors state dict and convert values to torch.Tensor if needed
raw_state = load_file("models/ppo_xauusd.safetensors")
state_dict = {k: (torch.tensor(v) if not isinstance(v, torch.Tensor) else v) for k, v in raw_state.items()}

# Load weights into the policy
model.policy.load_state_dict(state_dict)

# Ensure the model has the same env wrapper
model.set_env(env)
```

Notes:
- Option A is preferred when `ppo_xauusd.zip` is available (it contains the entire SB3 model).
- Option B is useful when only the policy weights were exported as SafeTensors. Ensure the policy architecture and observation/action spaces match the original training setup.
- Always set `vec.training = False` and `vec.norm_reward = False` when running inference.


### For Full Inference

To use the model for trading, you'll need to:
1. Set up the trading environment (`XAUUSDTradingEnv`)
2. Load VecNormalize stats
3. Run predictions

Note: This is a simulation model. Use with caution in real trading.

## Training Configuration

- Learning Rate: 0.0003
- Batch Size: 256
- Gamma: 0.99
- GAE Lambda: 0.95
- Clip Range: 0.2
- Entropy Coefficient: 0.01

## Files

- `ppo_xauusd.safetensors`: Model weights in SafeTensors format
- `vecnormalize.pkl`: VecNormalize statistics for observation normalization

## License

MIT License

## Disclaimer

This model is for educational and research purposes only. Trading involves risk, and past performance does not guarantee future results. Always backtest and validate before using in live trading.