Update README.md
Browse files
README.md
CHANGED
|
@@ -29,3 +29,22 @@ Training progress:
|
|
| 29 |
Numbers on X axis are average over 40 episodes, each lasting for about 500 timesteps on average. So in total the agent was trained over about 5e6 timesteps.
|
| 30 |
Learning rate decay schedule: <code>torch.optim.lr_scheduler.StepLR(opt, step_size=4000, gamma=0.7)</code>
|
| 31 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 29 |
Numbers on X axis are average over 40 episodes, each lasting for about 500 timesteps on average. So in total the agent was trained over about 5e6 timesteps.
|
| 30 |
Learning rate decay schedule: <code>torch.optim.lr_scheduler.StepLR(opt, step_size=4000, gamma=0.7)</code>
|
| 31 |
|
| 32 |
+
Minimal code to use the agent:</br>
|
| 33 |
+
<pre><code>
|
| 34 |
+
import gym</br>
|
| 35 |
+
</br>
|
| 36 |
+
env_name = 'LunarLanderContinuous-v2'</br>
|
| 37 |
+
env = gym.make(env_name)</br>
|
| 38 |
+
agent = torch.load('best_models/best_reinforce_lunar_lander_cont_model_269.402.pt')</br>
|
| 39 |
+
render = True</br>
|
| 40 |
+
observation = env.reset()</br>
|
| 41 |
+
while True:</br>
|
| 42 |
+
if render:</br>
|
| 43 |
+
env.render()</br>
|
| 44 |
+
action = agent.act(observation)</br>
|
| 45 |
+
observation, reward, done, info = env.step(action)</br>
|
| 46 |
+
</br>
|
| 47 |
+
if done:</br>
|
| 48 |
+
break</br>
|
| 49 |
+
env.close()</br>
|
| 50 |
+
</code></pre>
|