JonusNattapong
/

Reinforcement-Learning-for-Gold-Trading-Model

@@ -50,20 +50,69 @@ This repository contains a Reinforcement Learning model trained using Proximal P
 ### Loading the Model
 ```python
-from safetensors.torch import load_file
 from stable_baselines3 import PPO
 import torch
-# Load state dict from safetensors
-state_dict = load_file("ppo_xauusd.safetensors")
-policy = PPO.policy_class(observation_space, action_space)  # Define spaces accordingly
-policy.load_state_dict(state_dict)
-# Create model
-model = PPO(policy=policy, env=env)  # Or load full model if available
 ```
 ### For Full Inference
 To use the model for trading, you'll need to:

 ### Loading the Model
+Below are two safe ways to load the trained policy depending on what you have available.
+Option A — Load the full Stable-Baselines3 model (.zip)
 ```python
 from stable_baselines3 import PPO
+from stable_baselines3.common.vec_env import VecNormalize
+import os
+# Create or reconstruct an environment similar to the one used for training
+# e.g. `env = make_your_env(...)` — replace with your env factory
+env = ...
+# If you saved VecNormalize separately, load and wrap your env first
+if os.path.exists("models/vecnormalize.pkl"):
+	vec = VecNormalize.load("models/vecnormalize.pkl", env)
+	vec.training = False
+	vec.norm_reward = False
+	env = vec
+# Load the full model (policy + optimizer state)
+model = PPO.load("models/ppo_xauusd.zip", env=env)
+```
+Option B — Load weights saved as SafeTensors into a fresh PPO policy
+```python
+from safetensors.torch import load_file
 import torch
+from stable_baselines3 import PPO
+from stable_baselines3.common.vec_env import VecNormalize
+import os
+# Create or reconstruct the same environment used for training
+env = ...
+# If you have VecNormalize statistics, load them and wrap the env
+if os.path.exists("models/vecnormalize.pkl"):
+	vec = VecNormalize.load("models/vecnormalize.pkl", env)
+	vec.training = False
+	vec.norm_reward = False
+	env = vec
+# Instantiate a PPO model with the same policy architecture
+model = PPO("MlpPolicy", env)
+# Load SafeTensors state dict and convert values to torch.Tensor if needed
+raw_state = load_file("models/ppo_xauusd.safetensors")
+state_dict = {k: (torch.tensor(v) if not isinstance(v, torch.Tensor) else v) for k, v in raw_state.items()}
+# Load weights into the policy
+model.policy.load_state_dict(state_dict)
+# Ensure the model has the same env wrapper
+model.set_env(env)
 ```
+Notes:
+- Option A is preferred when `ppo_xauusd.zip` is available (it contains the entire SB3 model).
+- Option B is useful when only the policy weights were exported as SafeTensors. Ensure the policy architecture and observation/action spaces match the original training setup.
+- Always set `vec.training = False` and `vec.norm_reward = False` when running inference.
 ### For Full Inference
 To use the model for trading, you'll need to: