JoshuaFreeman commited on
Commit
2b3a44d
·
verified ·
1 Parent(s): 522b3bc

Upload training_log.json with huggingface_hub

Browse files
Files changed (1) hide show
  1. training_log.json +3870 -0
training_log.json CHANGED
@@ -592,5 +592,3875 @@
592
  "mean_length": 3142.3,
593
  "loss": 1779.034423828125,
594
  "sps": 167.8109154720703
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
595
  }
596
  ]
 
592
  "mean_length": 3142.3,
593
  "loss": 1779.034423828125,
594
  "sps": 167.8109154720703
595
+ },
596
+ {
597
+ "update": 335,
598
+ "global_step": 1372160,
599
+ "num_episodes": 378,
600
+ "mean_reward": 357.36821466445923,
601
+ "mean_length": 2510.76,
602
+ "loss": 33.58185958862305,
603
+ "sps": 451.94218989282257
604
+ },
605
+ {
606
+ "update": 340,
607
+ "global_step": 1392640,
608
+ "num_episodes": 388,
609
+ "mean_reward": 455.7884948730469,
610
+ "mean_length": 2739.63,
611
+ "loss": 16.64685821533203,
612
+ "sps": 273.0263210661101
613
+ },
614
+ {
615
+ "update": 345,
616
+ "global_step": 1413120,
617
+ "num_episodes": 390,
618
+ "mean_reward": 456.36600038528445,
619
+ "mean_length": 2831.63,
620
+ "loss": 53.800376892089844,
621
+ "sps": 262.8572795282316
622
+ },
623
+ {
624
+ "update": 350,
625
+ "global_step": 1433600,
626
+ "num_episodes": 392,
627
+ "mean_reward": 358.3022264003754,
628
+ "mean_length": 2855.26,
629
+ "loss": 275.63385009765625,
630
+ "sps": 446.63750460322206
631
+ },
632
+ {
633
+ "update": 355,
634
+ "global_step": 1454080,
635
+ "num_episodes": 407,
636
+ "mean_reward": 375.7198329591751,
637
+ "mean_length": 2911.33,
638
+ "loss": 8.481382369995117,
639
+ "sps": 367.7815116282032
640
+ },
641
+ {
642
+ "update": 360,
643
+ "global_step": 1474560,
644
+ "num_episodes": 417,
645
+ "mean_reward": 297.66134167194366,
646
+ "mean_length": 2867.69,
647
+ "loss": 23.24544334411621,
648
+ "sps": 538.1701556626739
649
+ },
650
+ {
651
+ "update": 365,
652
+ "global_step": 1495040,
653
+ "num_episodes": 423,
654
+ "mean_reward": 297.34756595134735,
655
+ "mean_length": 2864.35,
656
+ "loss": 4.300067901611328,
657
+ "sps": 967.5617761077667
658
+ },
659
+ {
660
+ "update": 370,
661
+ "global_step": 1515520,
662
+ "num_episodes": 426,
663
+ "mean_reward": 420.4760777759552,
664
+ "mean_length": 2769.38,
665
+ "loss": 0.5866292715072632,
666
+ "sps": 666.1817947182782
667
+ },
668
+ {
669
+ "update": 375,
670
+ "global_step": 1536000,
671
+ "num_episodes": 429,
672
+ "mean_reward": 420.90687354564665,
673
+ "mean_length": 2720.24,
674
+ "loss": 1.5303698778152466,
675
+ "sps": 387.94660115994657
676
+ },
677
+ {
678
+ "update": 380,
679
+ "global_step": 1556480,
680
+ "num_episodes": 447,
681
+ "mean_reward": 356.93565448284147,
682
+ "mean_length": 2600.92,
683
+ "loss": 1.487197995185852,
684
+ "sps": 563.1387661188227
685
+ },
686
+ {
687
+ "update": 385,
688
+ "global_step": 1576960,
689
+ "num_episodes": 452,
690
+ "mean_reward": 357.39739652633665,
691
+ "mean_length": 2840.12,
692
+ "loss": 139.934326171875,
693
+ "sps": 419.07382914990956
694
+ },
695
+ {
696
+ "update": 390,
697
+ "global_step": 1597440,
698
+ "num_episodes": 456,
699
+ "mean_reward": 320.4454140949249,
700
+ "mean_length": 2739.7,
701
+ "loss": 35.25250244140625,
702
+ "sps": 1220.6982535284524
703
+ },
704
+ {
705
+ "update": 395,
706
+ "global_step": 1617920,
707
+ "num_episodes": 456,
708
+ "mean_reward": 320.4454140949249,
709
+ "mean_length": 2739.7,
710
+ "loss": 3.3828349113464355,
711
+ "sps": 1342.5750178490377
712
+ },
713
+ {
714
+ "update": 400,
715
+ "global_step": 1638400,
716
+ "num_episodes": 464,
717
+ "mean_reward": 328.27054366588595,
718
+ "mean_length": 2746.2,
719
+ "loss": 9.8130521774292,
720
+ "sps": 1796.7926258567122
721
+ },
722
+ {
723
+ "update": 405,
724
+ "global_step": 1658880,
725
+ "num_episodes": 470,
726
+ "mean_reward": 469.22999305725097,
727
+ "mean_length": 2992.85,
728
+ "loss": 1.3559972047805786,
729
+ "sps": 692.6458346671639
730
+ },
731
+ {
732
+ "update": 410,
733
+ "global_step": 1679360,
734
+ "num_episodes": 490,
735
+ "mean_reward": 372.7216357469559,
736
+ "mean_length": 2776.16,
737
+ "loss": 1.9204107522964478,
738
+ "sps": 654.3363015594651
739
+ },
740
+ {
741
+ "update": 415,
742
+ "global_step": 1699840,
743
+ "num_episodes": 498,
744
+ "mean_reward": 355.0806073760986,
745
+ "mean_length": 2364.43,
746
+ "loss": 0.20851367712020874,
747
+ "sps": 1574.1015970289384
748
+ },
749
+ {
750
+ "update": 420,
751
+ "global_step": 1720320,
752
+ "num_episodes": 505,
753
+ "mean_reward": 357.1250058794022,
754
+ "mean_length": 2659.22,
755
+ "loss": 0.8760831356048584,
756
+ "sps": 725.5072374754453
757
+ },
758
+ {
759
+ "update": 425,
760
+ "global_step": 1740800,
761
+ "num_episodes": 519,
762
+ "mean_reward": 338.12661921024323,
763
+ "mean_length": 2572.09,
764
+ "loss": 0.7201396822929382,
765
+ "sps": 774.136034580983
766
+ },
767
+ {
768
+ "update": 430,
769
+ "global_step": 1761280,
770
+ "num_episodes": 520,
771
+ "mean_reward": 338.7818141031265,
772
+ "mean_length": 2670.69,
773
+ "loss": 0.2984998822212219,
774
+ "sps": 593.9586130082629
775
+ },
776
+ {
777
+ "update": 435,
778
+ "global_step": 1781760,
779
+ "num_episodes": 524,
780
+ "mean_reward": 200.51355738162994,
781
+ "mean_length": 2677.33,
782
+ "loss": 0.6558288931846619,
783
+ "sps": 563.7237889903446
784
+ },
785
+ {
786
+ "update": 440,
787
+ "global_step": 1802240,
788
+ "num_episodes": 531,
789
+ "mean_reward": 195.75862418174745,
790
+ "mean_length": 2684.96,
791
+ "loss": 0.6485836505889893,
792
+ "sps": 801.5054039510692
793
+ },
794
+ {
795
+ "update": 445,
796
+ "global_step": 1822720,
797
+ "num_episodes": 541,
798
+ "mean_reward": 182.37921036720275,
799
+ "mean_length": 2787.68,
800
+ "loss": 0.9738303422927856,
801
+ "sps": 552.0177086572941
802
+ },
803
+ {
804
+ "update": 450,
805
+ "global_step": 1843200,
806
+ "num_episodes": 544,
807
+ "mean_reward": 182.38307575702666,
808
+ "mean_length": 2886.96,
809
+ "loss": 0.35677605867385864,
810
+ "sps": 1078.6474451651457
811
+ },
812
+ {
813
+ "update": 455,
814
+ "global_step": 1863680,
815
+ "num_episodes": 545,
816
+ "mean_reward": 183.31554008960723,
817
+ "mean_length": 2983.98,
818
+ "loss": 2.5785653591156006,
819
+ "sps": 2238.2674979327135
820
+ },
821
+ {
822
+ "update": 460,
823
+ "global_step": 1884160,
824
+ "num_episodes": 552,
825
+ "mean_reward": 184.02062775611878,
826
+ "mean_length": 2848.2,
827
+ "loss": 2.8595657348632812,
828
+ "sps": 788.1771239252804
829
+ },
830
+ {
831
+ "update": 465,
832
+ "global_step": 1904640,
833
+ "num_episodes": 556,
834
+ "mean_reward": 185.80799278259278,
835
+ "mean_length": 3043.7,
836
+ "loss": 1.9609827995300293,
837
+ "sps": 577.1914698783215
838
+ },
839
+ {
840
+ "update": 470,
841
+ "global_step": 1925120,
842
+ "num_episodes": 562,
843
+ "mean_reward": 187.01453431129457,
844
+ "mean_length": 3143.24,
845
+ "loss": 3.339183807373047,
846
+ "sps": 1007.594504286708
847
+ },
848
+ {
849
+ "update": 475,
850
+ "global_step": 1945600,
851
+ "num_episodes": 566,
852
+ "mean_reward": 173.54088757038116,
853
+ "mean_length": 2963.92,
854
+ "loss": 0.6618616580963135,
855
+ "sps": 901.0594653949358
856
+ },
857
+ {
858
+ "update": 480,
859
+ "global_step": 1966080,
860
+ "num_episodes": 575,
861
+ "mean_reward": 33.9250515460968,
862
+ "mean_length": 2871.77,
863
+ "loss": 2.1673176288604736,
864
+ "sps": 637.9211744730668
865
+ },
866
+ {
867
+ "update": 485,
868
+ "global_step": 1986560,
869
+ "num_episodes": 586,
870
+ "mean_reward": 34.479522528648374,
871
+ "mean_length": 3064.33,
872
+ "loss": 0.5433505773544312,
873
+ "sps": 434.7028394813384
874
+ },
875
+ {
876
+ "update": 490,
877
+ "global_step": 2007040,
878
+ "num_episodes": 596,
879
+ "mean_reward": 35.79800989627838,
880
+ "mean_length": 3282.82,
881
+ "loss": 0.78948974609375,
882
+ "sps": 336.9612684589691
883
+ },
884
+ {
885
+ "update": 495,
886
+ "global_step": 2027520,
887
+ "num_episodes": 604,
888
+ "mean_reward": 33.232103657722476,
889
+ "mean_length": 2993.3,
890
+ "loss": 1.0133846998214722,
891
+ "sps": 825.0153267212829
892
+ },
893
+ {
894
+ "update": 500,
895
+ "global_step": 2048000,
896
+ "num_episodes": 608,
897
+ "mean_reward": 35.19134169101715,
898
+ "mean_length": 3091.99,
899
+ "loss": 2.315326690673828,
900
+ "sps": 1734.6708087716852
901
+ },
902
+ {
903
+ "update": 505,
904
+ "global_step": 2068480,
905
+ "num_episodes": 626,
906
+ "mean_reward": 34.19606147289276,
907
+ "mean_length": 2992.5,
908
+ "loss": 0.5885242223739624,
909
+ "sps": 326.9154194294925
910
+ },
911
+ {
912
+ "update": 510,
913
+ "global_step": 2088960,
914
+ "num_episodes": 635,
915
+ "mean_reward": 34.653258776664735,
916
+ "mean_length": 2903.6,
917
+ "loss": 0.5012519359588623,
918
+ "sps": 618.1195825840496
919
+ },
920
+ {
921
+ "update": 515,
922
+ "global_step": 2109440,
923
+ "num_episodes": 648,
924
+ "mean_reward": 31.289368829727174,
925
+ "mean_length": 2418.77,
926
+ "loss": 3.98177433013916,
927
+ "sps": 646.2346813472644
928
+ },
929
+ {
930
+ "update": 520,
931
+ "global_step": 2129920,
932
+ "num_episodes": 655,
933
+ "mean_reward": 31.192284369468688,
934
+ "mean_length": 2226.18,
935
+ "loss": 0.6265271306037903,
936
+ "sps": 1099.3023206141552
937
+ },
938
+ {
939
+ "update": 525,
940
+ "global_step": 2150400,
941
+ "num_episodes": 660,
942
+ "mean_reward": 31.34406463623047,
943
+ "mean_length": 2219.04,
944
+ "loss": 0.10834737122058868,
945
+ "sps": 1013.9364910908357
946
+ },
947
+ {
948
+ "update": 530,
949
+ "global_step": 2170880,
950
+ "num_episodes": 663,
951
+ "mean_reward": 32.43585729598999,
952
+ "mean_length": 2310.25,
953
+ "loss": 0.16828738152980804,
954
+ "sps": 2044.258556649656
955
+ },
956
+ {
957
+ "update": 535,
958
+ "global_step": 2191360,
959
+ "num_episodes": 673,
960
+ "mean_reward": 31.12276816368103,
961
+ "mean_length": 2188.44,
962
+ "loss": 0.30583497881889343,
963
+ "sps": 669.970940042682
964
+ },
965
+ {
966
+ "update": 540,
967
+ "global_step": 2211840,
968
+ "num_episodes": 688,
969
+ "mean_reward": 31.249693727493288,
970
+ "mean_length": 2081.05,
971
+ "loss": 0.17176635563373566,
972
+ "sps": 590.5779813457752
973
+ },
974
+ {
975
+ "update": 545,
976
+ "global_step": 2232320,
977
+ "num_episodes": 694,
978
+ "mean_reward": 33.27913472175598,
979
+ "mean_length": 2258.75,
980
+ "loss": 0.7694998979568481,
981
+ "sps": 738.4276136206116
982
+ },
983
+ {
984
+ "update": 550,
985
+ "global_step": 2252800,
986
+ "num_episodes": 698,
987
+ "mean_reward": 33.9422331905365,
988
+ "mean_length": 2351.64,
989
+ "loss": 0.15182125568389893,
990
+ "sps": 762.723220730318
991
+ },
992
+ {
993
+ "update": 555,
994
+ "global_step": 2273280,
995
+ "num_episodes": 700,
996
+ "mean_reward": 34.04119193077087,
997
+ "mean_length": 2350.76,
998
+ "loss": 0.032472074031829834,
999
+ "sps": 1245.5515732240913
1000
+ },
1001
+ {
1002
+ "update": 560,
1003
+ "global_step": 2293760,
1004
+ "num_episodes": 712,
1005
+ "mean_reward": 32.31116131782532,
1006
+ "mean_length": 2355.04,
1007
+ "loss": 0.5084341168403625,
1008
+ "sps": 356.9310128125497
1009
+ },
1010
+ {
1011
+ "update": 565,
1012
+ "global_step": 2314240,
1013
+ "num_episodes": 719,
1014
+ "mean_reward": 32.47976410865784,
1015
+ "mean_length": 2441.54,
1016
+ "loss": 0.21268926560878754,
1017
+ "sps": 649.2244348171059
1018
+ },
1019
+ {
1020
+ "update": 570,
1021
+ "global_step": 2334720,
1022
+ "num_episodes": 728,
1023
+ "mean_reward": 33.27659282684326,
1024
+ "mean_length": 2530.94,
1025
+ "loss": 0.9577451348304749,
1026
+ "sps": 768.1072804561215
1027
+ },
1028
+ {
1029
+ "update": 575,
1030
+ "global_step": 2355200,
1031
+ "num_episodes": 734,
1032
+ "mean_reward": 31.69405174255371,
1033
+ "mean_length": 2412.76,
1034
+ "loss": 1.0152896642684937,
1035
+ "sps": 1202.6180581266553
1036
+ },
1037
+ {
1038
+ "update": 580,
1039
+ "global_step": 2375680,
1040
+ "num_episodes": 751,
1041
+ "mean_reward": 32.80648866415024,
1042
+ "mean_length": 2678.86,
1043
+ "loss": 1.551302433013916,
1044
+ "sps": 620.7046830826843
1045
+ },
1046
+ {
1047
+ "update": 585,
1048
+ "global_step": 2396160,
1049
+ "num_episodes": 757,
1050
+ "mean_reward": 29.564244215488433,
1051
+ "mean_length": 2585.36,
1052
+ "loss": 3.1353652477264404,
1053
+ "sps": 488.49247923421996
1054
+ },
1055
+ {
1056
+ "update": 590,
1057
+ "global_step": 2416640,
1058
+ "num_episodes": 758,
1059
+ "mean_reward": 30.12968365430832,
1060
+ "mean_length": 2685.11,
1061
+ "loss": 1.9931682348251343,
1062
+ "sps": 1344.82690419507
1063
+ },
1064
+ {
1065
+ "update": 595,
1066
+ "global_step": 2437120,
1067
+ "num_episodes": 765,
1068
+ "mean_reward": 28.753595340251923,
1069
+ "mean_length": 2505.15,
1070
+ "loss": 1.9217519760131836,
1071
+ "sps": 447.9409443511792
1072
+ },
1073
+ {
1074
+ "update": 600,
1075
+ "global_step": 2457600,
1076
+ "num_episodes": 777,
1077
+ "mean_reward": 30.216613895893097,
1078
+ "mean_length": 2711.46,
1079
+ "loss": 18.632978439331055,
1080
+ "sps": 404.03595919196334
1081
+ },
1082
+ {
1083
+ "update": 605,
1084
+ "global_step": 2478080,
1085
+ "num_episodes": 795,
1086
+ "mean_reward": 25.763007862567903,
1087
+ "mean_length": 2472.96,
1088
+ "loss": 0.08484360575675964,
1089
+ "sps": 273.80019869630905
1090
+ },
1091
+ {
1092
+ "update": 610,
1093
+ "global_step": 2498560,
1094
+ "num_episodes": 800,
1095
+ "mean_reward": 25.233678991794587,
1096
+ "mean_length": 2472.55,
1097
+ "loss": 1.4983996152877808,
1098
+ "sps": 842.5169450671826
1099
+ },
1100
+ {
1101
+ "update": 615,
1102
+ "global_step": 2519040,
1103
+ "num_episodes": 800,
1104
+ "mean_reward": 25.233678991794587,
1105
+ "mean_length": 2472.55,
1106
+ "loss": 2.3043322563171387,
1107
+ "sps": 1633.9246020797414
1108
+ },
1109
+ {
1110
+ "update": 620,
1111
+ "global_step": 2539520,
1112
+ "num_episodes": 810,
1113
+ "mean_reward": 25.826587975025177,
1114
+ "mean_length": 2510.48,
1115
+ "loss": 0.21852731704711914,
1116
+ "sps": 518.4307357532686
1117
+ },
1118
+ {
1119
+ "update": 625,
1120
+ "global_step": 2560000,
1121
+ "num_episodes": 814,
1122
+ "mean_reward": 25.645928237438202,
1123
+ "mean_length": 2512.63,
1124
+ "loss": 1.0583765506744385,
1125
+ "sps": 581.0047205132955
1126
+ },
1127
+ {
1128
+ "update": 630,
1129
+ "global_step": 2580480,
1130
+ "num_episodes": 823,
1131
+ "mean_reward": 25.49915248632431,
1132
+ "mean_length": 2527.36,
1133
+ "loss": 1.795029878616333,
1134
+ "sps": 537.504310438823
1135
+ },
1136
+ {
1137
+ "update": 635,
1138
+ "global_step": 2600960,
1139
+ "num_episodes": 825,
1140
+ "mean_reward": 26.937079560756683,
1141
+ "mean_length": 2620.74,
1142
+ "loss": 0.2923164963722229,
1143
+ "sps": 1350.1934205312384
1144
+ },
1145
+ {
1146
+ "update": 640,
1147
+ "global_step": 2621440,
1148
+ "num_episodes": 834,
1149
+ "mean_reward": 29.924575612545013,
1150
+ "mean_length": 2905.58,
1151
+ "loss": 0.43789708614349365,
1152
+ "sps": 280.4967413720513
1153
+ },
1154
+ {
1155
+ "update": 645,
1156
+ "global_step": 2641920,
1157
+ "num_episodes": 836,
1158
+ "mean_reward": 28.948164422512054,
1159
+ "mean_length": 2820.95,
1160
+ "loss": 1.5149576663970947,
1161
+ "sps": 640.5808040483822
1162
+ },
1163
+ {
1164
+ "update": 650,
1165
+ "global_step": 2662400,
1166
+ "num_episodes": 843,
1167
+ "mean_reward": 35.76819623708725,
1168
+ "mean_length": 3020.26,
1169
+ "loss": 1.5319448709487915,
1170
+ "sps": 573.556787061315
1171
+ },
1172
+ {
1173
+ "update": 655,
1174
+ "global_step": 2682880,
1175
+ "num_episodes": 851,
1176
+ "mean_reward": 35.12200962543488,
1177
+ "mean_length": 2929.69,
1178
+ "loss": 0.4334838390350342,
1179
+ "sps": 721.898659363473
1180
+ },
1181
+ {
1182
+ "update": 660,
1183
+ "global_step": 2703360,
1184
+ "num_episodes": 863,
1185
+ "mean_reward": 34.17141466140747,
1186
+ "mean_length": 2837.57,
1187
+ "loss": 15.400986671447754,
1188
+ "sps": 531.226152703821
1189
+ },
1190
+ {
1191
+ "update": 665,
1192
+ "global_step": 2723840,
1193
+ "num_episodes": 871,
1194
+ "mean_reward": 34.406737427711484,
1195
+ "mean_length": 2847.12,
1196
+ "loss": -0.022274762392044067,
1197
+ "sps": 1102.2418013565505
1198
+ },
1199
+ {
1200
+ "update": 670,
1201
+ "global_step": 2744320,
1202
+ "num_episodes": 880,
1203
+ "mean_reward": 33.42318947315216,
1204
+ "mean_length": 2744.42,
1205
+ "loss": 0.21278248727321625,
1206
+ "sps": 932.5471353242045
1207
+ },
1208
+ {
1209
+ "update": 675,
1210
+ "global_step": 2764800,
1211
+ "num_episodes": 881,
1212
+ "mean_reward": 35.02722130298614,
1213
+ "mean_length": 2793.45,
1214
+ "loss": 0.48318442702293396,
1215
+ "sps": 849.2902333548969
1216
+ },
1217
+ {
1218
+ "update": 680,
1219
+ "global_step": 2785280,
1220
+ "num_episodes": 894,
1221
+ "mean_reward": 42.5996656703949,
1222
+ "mean_length": 2993.97,
1223
+ "loss": 0.9185795187950134,
1224
+ "sps": 476.82647813725623
1225
+ },
1226
+ {
1227
+ "update": 685,
1228
+ "global_step": 2805760,
1229
+ "num_episodes": 914,
1230
+ "mean_reward": 40.827002143859865,
1231
+ "mean_length": 2561.69,
1232
+ "loss": 1.2765486240386963,
1233
+ "sps": 255.71578525021312
1234
+ },
1235
+ {
1236
+ "update": 690,
1237
+ "global_step": 2826240,
1238
+ "num_episodes": 925,
1239
+ "mean_reward": 45.62547200202942,
1240
+ "mean_length": 2469.27,
1241
+ "loss": 2.3010942935943604,
1242
+ "sps": 570.2368795975491
1243
+ },
1244
+ {
1245
+ "update": 695,
1246
+ "global_step": 2846720,
1247
+ "num_episodes": 931,
1248
+ "mean_reward": 43.9872697687149,
1249
+ "mean_length": 2287.32,
1250
+ "loss": 1.1779371500015259,
1251
+ "sps": 532.4176047731756
1252
+ },
1253
+ {
1254
+ "update": 700,
1255
+ "global_step": 2867200,
1256
+ "num_episodes": 943,
1257
+ "mean_reward": 36.38923056125641,
1258
+ "mean_length": 1897.85,
1259
+ "loss": 0.6398267149925232,
1260
+ "sps": 501.06106420522286
1261
+ },
1262
+ {
1263
+ "update": 705,
1264
+ "global_step": 2887680,
1265
+ "num_episodes": 952,
1266
+ "mean_reward": 38.85928546905517,
1267
+ "mean_length": 2189.91,
1268
+ "loss": 0.7464686632156372,
1269
+ "sps": 358.482733937848
1270
+ },
1271
+ {
1272
+ "update": 710,
1273
+ "global_step": 2908160,
1274
+ "num_episodes": 960,
1275
+ "mean_reward": 38.50984364032745,
1276
+ "mean_length": 2181.81,
1277
+ "loss": 0.819063663482666,
1278
+ "sps": 729.0695475685854
1279
+ },
1280
+ {
1281
+ "update": 715,
1282
+ "global_step": 2928640,
1283
+ "num_episodes": 960,
1284
+ "mean_reward": 38.50984364032745,
1285
+ "mean_length": 2181.81,
1286
+ "loss": 1.3829450607299805,
1287
+ "sps": 2103.44152639025
1288
+ },
1289
+ {
1290
+ "update": 720,
1291
+ "global_step": 2949120,
1292
+ "num_episodes": 961,
1293
+ "mean_reward": 38.30655426502228,
1294
+ "mean_length": 2181.81,
1295
+ "loss": 1.3242486715316772,
1296
+ "sps": 2180.924028748301
1297
+ },
1298
+ {
1299
+ "update": 725,
1300
+ "global_step": 2969600,
1301
+ "num_episodes": 979,
1302
+ "mean_reward": 37.64626069068909,
1303
+ "mean_length": 2258.86,
1304
+ "loss": 0.6890235543251038,
1305
+ "sps": 512.7806225711571
1306
+ },
1307
+ {
1308
+ "update": 730,
1309
+ "global_step": 2990080,
1310
+ "num_episodes": 993,
1311
+ "mean_reward": 29.446645002365113,
1312
+ "mean_length": 2178.52,
1313
+ "loss": 1.424858570098877,
1314
+ "sps": 397.19464657996275
1315
+ },
1316
+ {
1317
+ "update": 735,
1318
+ "global_step": 3010560,
1319
+ "num_episodes": 994,
1320
+ "mean_reward": 29.469437551498412,
1321
+ "mean_length": 2182.57,
1322
+ "loss": 1.9823501110076904,
1323
+ "sps": 742.698176618862
1324
+ },
1325
+ {
1326
+ "update": 740,
1327
+ "global_step": 3031040,
1328
+ "num_episodes": 995,
1329
+ "mean_reward": 29.75782982826233,
1330
+ "mean_length": 2282.33,
1331
+ "loss": 0.30816513299942017,
1332
+ "sps": 1825.6853767545124
1333
+ },
1334
+ {
1335
+ "update": 745,
1336
+ "global_step": 3051520,
1337
+ "num_episodes": 998,
1338
+ "mean_reward": 30.453793020248412,
1339
+ "mean_length": 2473.99,
1340
+ "loss": 3.092510938644409,
1341
+ "sps": 1555.5380368163708
1342
+ },
1343
+ {
1344
+ "update": 750,
1345
+ "global_step": 3072000,
1346
+ "num_episodes": 1010,
1347
+ "mean_reward": 31.862056040763854,
1348
+ "mean_length": 2872.71,
1349
+ "loss": 3.4220612049102783,
1350
+ "sps": 266.50791667174065
1351
+ },
1352
+ {
1353
+ "update": 755,
1354
+ "global_step": 3092480,
1355
+ "num_episodes": 1019,
1356
+ "mean_reward": 28.9614812707901,
1357
+ "mean_length": 2699.59,
1358
+ "loss": 1.2745823860168457,
1359
+ "sps": 753.5067688712668
1360
+ },
1361
+ {
1362
+ "update": 760,
1363
+ "global_step": 3112960,
1364
+ "num_episodes": 1020,
1365
+ "mean_reward": 23.99162916660309,
1366
+ "mean_length": 2699.59,
1367
+ "loss": 0.8926928043365479,
1368
+ "sps": 1444.3258733361963
1369
+ },
1370
+ {
1371
+ "update": 765,
1372
+ "global_step": 3133440,
1373
+ "num_episodes": 1024,
1374
+ "mean_reward": 30.025504064559936,
1375
+ "mean_length": 2889.38,
1376
+ "loss": 1.112790822982788,
1377
+ "sps": 756.883715254125
1378
+ },
1379
+ {
1380
+ "update": 770,
1381
+ "global_step": 3153920,
1382
+ "num_episodes": 1034,
1383
+ "mean_reward": 36.13991012096405,
1384
+ "mean_length": 3187.15,
1385
+ "loss": 0.43345746397972107,
1386
+ "sps": 419.40821109391015
1387
+ },
1388
+ {
1389
+ "update": 775,
1390
+ "global_step": 3174400,
1391
+ "num_episodes": 1050,
1392
+ "mean_reward": 32.24160755157471,
1393
+ "mean_length": 2985.91,
1394
+ "loss": 0.13569332659244537,
1395
+ "sps": 317.4046328348688
1396
+ },
1397
+ {
1398
+ "update": 780,
1399
+ "global_step": 3194880,
1400
+ "num_episodes": 1061,
1401
+ "mean_reward": 30.67223885536194,
1402
+ "mean_length": 2711.19,
1403
+ "loss": 0.2064545899629593,
1404
+ "sps": 303.1813456850613
1405
+ },
1406
+ {
1407
+ "update": 785,
1408
+ "global_step": 3215360,
1409
+ "num_episodes": 1072,
1410
+ "mean_reward": 29.705427560806275,
1411
+ "mean_length": 2524.86,
1412
+ "loss": 4.488283634185791,
1413
+ "sps": 913.592632749929
1414
+ },
1415
+ {
1416
+ "update": 790,
1417
+ "global_step": 3235840,
1418
+ "num_episodes": 1085,
1419
+ "mean_reward": 32.955479860305786,
1420
+ "mean_length": 2618.62,
1421
+ "loss": 0.05505555868148804,
1422
+ "sps": 607.269974208857
1423
+ },
1424
+ {
1425
+ "update": 795,
1426
+ "global_step": 3256320,
1427
+ "num_episodes": 1097,
1428
+ "mean_reward": 31.909876976013184,
1429
+ "mean_length": 2214.73,
1430
+ "loss": 1.9556808471679688,
1431
+ "sps": 747.2261381867389
1432
+ },
1433
+ {
1434
+ "update": 800,
1435
+ "global_step": 3276800,
1436
+ "num_episodes": 1106,
1437
+ "mean_reward": 29.67423951148987,
1438
+ "mean_length": 1915.87,
1439
+ "loss": 2.1206791400909424,
1440
+ "sps": 701.3988516159834
1441
+ },
1442
+ {
1443
+ "update": 805,
1444
+ "global_step": 3297280,
1445
+ "num_episodes": 1110,
1446
+ "mean_reward": 30.086654043197633,
1447
+ "mean_length": 2005.75,
1448
+ "loss": 0.5833985805511475,
1449
+ "sps": 660.5544733393301
1450
+ },
1451
+ {
1452
+ "update": 810,
1453
+ "global_step": 3317760,
1454
+ "num_episodes": 1115,
1455
+ "mean_reward": 32.238635430336,
1456
+ "mean_length": 2171.27,
1457
+ "loss": 0.4146590232849121,
1458
+ "sps": 1094.4092641494085
1459
+ },
1460
+ {
1461
+ "update": 815,
1462
+ "global_step": 3338240,
1463
+ "num_episodes": 1120,
1464
+ "mean_reward": 33.369399318695066,
1465
+ "mean_length": 2262.48,
1466
+ "loss": 2.585177183151245,
1467
+ "sps": 901.8559204378539
1468
+ },
1469
+ {
1470
+ "update": 820,
1471
+ "global_step": 3358720,
1472
+ "num_episodes": 1130,
1473
+ "mean_reward": 27.537890434265137,
1474
+ "mean_length": 2050.2,
1475
+ "loss": 0.5106171369552612,
1476
+ "sps": 838.9275291477041
1477
+ },
1478
+ {
1479
+ "update": 825,
1480
+ "global_step": 3379200,
1481
+ "num_episodes": 1136,
1482
+ "mean_reward": 23.029513311386108,
1483
+ "mean_length": 2161.34,
1484
+ "loss": 4.071342468261719,
1485
+ "sps": 1617.4028966230667
1486
+ },
1487
+ {
1488
+ "update": 830,
1489
+ "global_step": 3399680,
1490
+ "num_episodes": 1138,
1491
+ "mean_reward": 23.361328144073486,
1492
+ "mean_length": 2259.74,
1493
+ "loss": 0.30649739503860474,
1494
+ "sps": 489.67958732294835
1495
+ },
1496
+ {
1497
+ "update": 835,
1498
+ "global_step": 3420160,
1499
+ "num_episodes": 1146,
1500
+ "mean_reward": 29.256564235687257,
1501
+ "mean_length": 2379.17,
1502
+ "loss": 0.2716136574745178,
1503
+ "sps": 789.1092351378779
1504
+ },
1505
+ {
1506
+ "update": 840,
1507
+ "global_step": 3440640,
1508
+ "num_episodes": 1152,
1509
+ "mean_reward": 30.681161608695984,
1510
+ "mean_length": 2564.53,
1511
+ "loss": 3.510643720626831,
1512
+ "sps": 465.91684356996774
1513
+ },
1514
+ {
1515
+ "update": 845,
1516
+ "global_step": 3461120,
1517
+ "num_episodes": 1157,
1518
+ "mean_reward": 31.814847540855407,
1519
+ "mean_length": 2661.46,
1520
+ "loss": 0.2368174046278,
1521
+ "sps": 970.5356676833453
1522
+ },
1523
+ {
1524
+ "update": 850,
1525
+ "global_step": 3481600,
1526
+ "num_episodes": 1158,
1527
+ "mean_reward": 32.19375528335571,
1528
+ "mean_length": 2760.37,
1529
+ "loss": 1.0439393520355225,
1530
+ "sps": 2118.0253570743016
1531
+ },
1532
+ {
1533
+ "update": 855,
1534
+ "global_step": 3502080,
1535
+ "num_episodes": 1167,
1536
+ "mean_reward": 32.78835594654083,
1537
+ "mean_length": 2934.75,
1538
+ "loss": 0.11038707196712494,
1539
+ "sps": 373.62636209376416
1540
+ },
1541
+ {
1542
+ "update": 860,
1543
+ "global_step": 3522560,
1544
+ "num_episodes": 1171,
1545
+ "mean_reward": 38.54492960453033,
1546
+ "mean_length": 3055.33,
1547
+ "loss": 0.27012330293655396,
1548
+ "sps": 811.5381951612115
1549
+ },
1550
+ {
1551
+ "update": 865,
1552
+ "global_step": 3543040,
1553
+ "num_episodes": 1182,
1554
+ "mean_reward": 38.79421305179596,
1555
+ "mean_length": 3135.05,
1556
+ "loss": 0.8694961071014404,
1557
+ "sps": 643.9358039841488
1558
+ },
1559
+ {
1560
+ "update": 870,
1561
+ "global_step": 3563520,
1562
+ "num_episodes": 1187,
1563
+ "mean_reward": 37.14653766155243,
1564
+ "mean_length": 3053.45,
1565
+ "loss": 1.2393195629119873,
1566
+ "sps": 871.9844580529647
1567
+ },
1568
+ {
1569
+ "update": 875,
1570
+ "global_step": 3584000,
1571
+ "num_episodes": 1194,
1572
+ "mean_reward": 40.78861089706421,
1573
+ "mean_length": 3345.54,
1574
+ "loss": 0.6096416115760803,
1575
+ "sps": 304.93330977862365
1576
+ },
1577
+ {
1578
+ "update": 880,
1579
+ "global_step": 3604480,
1580
+ "num_episodes": 1201,
1581
+ "mean_reward": 39.81052246570587,
1582
+ "mean_length": 3252.28,
1583
+ "loss": 0.6608580946922302,
1584
+ "sps": 677.495925390962
1585
+ },
1586
+ {
1587
+ "update": 885,
1588
+ "global_step": 3624960,
1589
+ "num_episodes": 1215,
1590
+ "mean_reward": 38.70529013156891,
1591
+ "mean_length": 3165.08,
1592
+ "loss": 1.2741384506225586,
1593
+ "sps": 854.5639976310788
1594
+ },
1595
+ {
1596
+ "update": 890,
1597
+ "global_step": 3645440,
1598
+ "num_episodes": 1215,
1599
+ "mean_reward": 38.70529013156891,
1600
+ "mean_length": 3165.08,
1601
+ "loss": 0.5457536578178406,
1602
+ "sps": 1511.837849491964
1603
+ },
1604
+ {
1605
+ "update": 895,
1606
+ "global_step": 3665920,
1607
+ "num_episodes": 1220,
1608
+ "mean_reward": 39.244035544395445,
1609
+ "mean_length": 3264.42,
1610
+ "loss": 0.6924870014190674,
1611
+ "sps": 556.2715786447075
1612
+ },
1613
+ {
1614
+ "update": 900,
1615
+ "global_step": 3686400,
1616
+ "num_episodes": 1232,
1617
+ "mean_reward": 46.50887234210968,
1618
+ "mean_length": 3173.4,
1619
+ "loss": 1.7522594928741455,
1620
+ "sps": 699.9425408800316
1621
+ },
1622
+ {
1623
+ "update": 905,
1624
+ "global_step": 3706880,
1625
+ "num_episodes": 1245,
1626
+ "mean_reward": 41.68342576980591,
1627
+ "mean_length": 2950.87,
1628
+ "loss": 1.1043925285339355,
1629
+ "sps": 375.42107758469984
1630
+ },
1631
+ {
1632
+ "update": 910,
1633
+ "global_step": 3727360,
1634
+ "num_episodes": 1257,
1635
+ "mean_reward": 38.028645734786984,
1636
+ "mean_length": 2550.61,
1637
+ "loss": 0.11873626708984375,
1638
+ "sps": 856.8550488972196
1639
+ },
1640
+ {
1641
+ "update": 915,
1642
+ "global_step": 3747840,
1643
+ "num_episodes": 1265,
1644
+ "mean_reward": 38.954245281219485,
1645
+ "mean_length": 2549.56,
1646
+ "loss": 0.31198206543922424,
1647
+ "sps": 690.3971885269259
1648
+ },
1649
+ {
1650
+ "update": 920,
1651
+ "global_step": 3768320,
1652
+ "num_episodes": 1269,
1653
+ "mean_reward": 34.42387727260589,
1654
+ "mean_length": 2518.34,
1655
+ "loss": 1.7575799226760864,
1656
+ "sps": 1152.958968508612
1657
+ },
1658
+ {
1659
+ "update": 925,
1660
+ "global_step": 3788800,
1661
+ "num_episodes": 1274,
1662
+ "mean_reward": 34.58799042224884,
1663
+ "mean_length": 2520.98,
1664
+ "loss": 0.3646126985549927,
1665
+ "sps": 1100.5795244909316
1666
+ },
1667
+ {
1668
+ "update": 930,
1669
+ "global_step": 3809280,
1670
+ "num_episodes": 1276,
1671
+ "mean_reward": 36.83985797405243,
1672
+ "mean_length": 2620.63,
1673
+ "loss": 0.5593718886375427,
1674
+ "sps": 999.9135216052517
1675
+ },
1676
+ {
1677
+ "update": 935,
1678
+ "global_step": 3829760,
1679
+ "num_episodes": 1292,
1680
+ "mean_reward": 36.57479173660278,
1681
+ "mean_length": 2520.6,
1682
+ "loss": 0.009059503674507141,
1683
+ "sps": 1097.323441932561
1684
+ },
1685
+ {
1686
+ "update": 940,
1687
+ "global_step": 3850240,
1688
+ "num_episodes": 1297,
1689
+ "mean_reward": 37.501826615333556,
1690
+ "mean_length": 2614.74,
1691
+ "loss": 0.686941385269165,
1692
+ "sps": 1048.1224362748299
1693
+ },
1694
+ {
1695
+ "update": 945,
1696
+ "global_step": 3870720,
1697
+ "num_episodes": 1302,
1698
+ "mean_reward": 38.2632510137558,
1699
+ "mean_length": 2706.87,
1700
+ "loss": 1.56045663356781,
1701
+ "sps": 984.6772159375175
1702
+ },
1703
+ {
1704
+ "update": 950,
1705
+ "global_step": 3891200,
1706
+ "num_episodes": 1303,
1707
+ "mean_reward": 37.39876995563507,
1708
+ "mean_length": 2611.05,
1709
+ "loss": 1.817370891571045,
1710
+ "sps": 662.4908033005096
1711
+ },
1712
+ {
1713
+ "update": 955,
1714
+ "global_step": 3911680,
1715
+ "num_episodes": 1312,
1716
+ "mean_reward": 38.347609777450565,
1717
+ "mean_length": 2799.94,
1718
+ "loss": 1.7936173677444458,
1719
+ "sps": 498.5900701113524
1720
+ },
1721
+ {
1722
+ "update": 960,
1723
+ "global_step": 3932160,
1724
+ "num_episodes": 1319,
1725
+ "mean_reward": 38.531728854179384,
1726
+ "mean_length": 2800.38,
1727
+ "loss": 0.8617715835571289,
1728
+ "sps": 288.6257663929259
1729
+ },
1730
+ {
1731
+ "update": 965,
1732
+ "global_step": 3952640,
1733
+ "num_episodes": 1325,
1734
+ "mean_reward": 32.38980568408966,
1735
+ "mean_length": 2706.94,
1736
+ "loss": 0.410921573638916,
1737
+ "sps": 729.5250188646075
1738
+ },
1739
+ {
1740
+ "update": 970,
1741
+ "global_step": 3973120,
1742
+ "num_episodes": 1330,
1743
+ "mean_reward": 33.02072194576263,
1744
+ "mean_length": 2800.87,
1745
+ "loss": 0.6215076446533203,
1746
+ "sps": 913.5447323136685
1747
+ },
1748
+ {
1749
+ "update": 975,
1750
+ "global_step": 3993600,
1751
+ "num_episodes": 1333,
1752
+ "mean_reward": 34.48237750530243,
1753
+ "mean_length": 2899.46,
1754
+ "loss": 1.4754104614257812,
1755
+ "sps": 857.6512350171981
1756
+ },
1757
+ {
1758
+ "update": 980,
1759
+ "global_step": 4014080,
1760
+ "num_episodes": 1346,
1761
+ "mean_reward": 35.15436544418335,
1762
+ "mean_length": 3006.69,
1763
+ "loss": 1.3220990896224976,
1764
+ "sps": 275.0666378661058
1765
+ },
1766
+ {
1767
+ "update": 985,
1768
+ "global_step": 4034560,
1769
+ "num_episodes": 1359,
1770
+ "mean_reward": 36.966177105903625,
1771
+ "mean_length": 3140.04,
1772
+ "loss": 0.31860384345054626,
1773
+ "sps": 723.735690218882
1774
+ },
1775
+ {
1776
+ "update": 990,
1777
+ "global_step": 4055040,
1778
+ "num_episodes": 1368,
1779
+ "mean_reward": 36.728344497680666,
1780
+ "mean_length": 2959.73,
1781
+ "loss": 5.1887078285217285,
1782
+ "sps": 773.332744431613
1783
+ },
1784
+ {
1785
+ "update": 995,
1786
+ "global_step": 4075520,
1787
+ "num_episodes": 1369,
1788
+ "mean_reward": 35.78023895263672,
1789
+ "mean_length": 2959.73,
1790
+ "loss": 1.6374891996383667,
1791
+ "sps": 2350.737845032927
1792
+ },
1793
+ {
1794
+ "update": 1000,
1795
+ "global_step": 4096000,
1796
+ "num_episodes": 1370,
1797
+ "mean_reward": 35.719558029174806,
1798
+ "mean_length": 2959.73,
1799
+ "loss": 1.2606910467147827,
1800
+ "sps": 2287.2771680470537
1801
+ },
1802
+ {
1803
+ "update": 1005,
1804
+ "global_step": 4116480,
1805
+ "num_episodes": 1381,
1806
+ "mean_reward": 37.67828846931457,
1807
+ "mean_length": 2963.83,
1808
+ "loss": 0.20636317133903503,
1809
+ "sps": 1606.0294791853535
1810
+ },
1811
+ {
1812
+ "update": 1010,
1813
+ "global_step": 4136960,
1814
+ "num_episodes": 1391,
1815
+ "mean_reward": 39.33717452049255,
1816
+ "mean_length": 3249.05,
1817
+ "loss": 3.9951727390289307,
1818
+ "sps": 418.63976772772594
1819
+ },
1820
+ {
1821
+ "update": 1015,
1822
+ "global_step": 4157440,
1823
+ "num_episodes": 1399,
1824
+ "mean_reward": 36.405463542938236,
1825
+ "mean_length": 2962.77,
1826
+ "loss": 1.837266445159912,
1827
+ "sps": 683.5481201646958
1828
+ },
1829
+ {
1830
+ "update": 1020,
1831
+ "global_step": 4177920,
1832
+ "num_episodes": 1401,
1833
+ "mean_reward": 35.86304131507873,
1834
+ "mean_length": 2879.03,
1835
+ "loss": 2.9102792739868164,
1836
+ "sps": 1089.8691629385837
1837
+ },
1838
+ {
1839
+ "update": 1025,
1840
+ "global_step": 4198400,
1841
+ "num_episodes": 1416,
1842
+ "mean_reward": 34.70472603797913,
1843
+ "mean_length": 2801.68,
1844
+ "loss": 0.9925887584686279,
1845
+ "sps": 255.81352094643265
1846
+ },
1847
+ {
1848
+ "update": 1030,
1849
+ "global_step": 4218880,
1850
+ "num_episodes": 1425,
1851
+ "mean_reward": 33.617551488876344,
1852
+ "mean_length": 2725.68,
1853
+ "loss": 1.9214203357696533,
1854
+ "sps": 737.2169737476238
1855
+ },
1856
+ {
1857
+ "update": 1035,
1858
+ "global_step": 4239360,
1859
+ "num_episodes": 1434,
1860
+ "mean_reward": 34.879857649803164,
1861
+ "mean_length": 2533.35,
1862
+ "loss": 0.2052566111087799,
1863
+ "sps": 579.7787814604097
1864
+ },
1865
+ {
1866
+ "update": 1040,
1867
+ "global_step": 4259840,
1868
+ "num_episodes": 1434,
1869
+ "mean_reward": 34.879857649803164,
1870
+ "mean_length": 2533.35,
1871
+ "loss": 1.6612746715545654,
1872
+ "sps": 1248.33251762393
1873
+ },
1874
+ {
1875
+ "update": 1045,
1876
+ "global_step": 4280320,
1877
+ "num_episodes": 1445,
1878
+ "mean_reward": 39.72180326938629,
1879
+ "mean_length": 2610.11,
1880
+ "loss": 4.8466691970825195,
1881
+ "sps": 656.9837380754918
1882
+ },
1883
+ {
1884
+ "update": 1050,
1885
+ "global_step": 4300800,
1886
+ "num_episodes": 1450,
1887
+ "mean_reward": 42.0785810136795,
1888
+ "mean_length": 2811.63,
1889
+ "loss": 2.377190351486206,
1890
+ "sps": 683.0332370956824
1891
+ },
1892
+ {
1893
+ "update": 1055,
1894
+ "global_step": 4321280,
1895
+ "num_episodes": 1461,
1896
+ "mean_reward": 39.593364839553836,
1897
+ "mean_length": 2808.07,
1898
+ "loss": 0.6159077882766724,
1899
+ "sps": 526.5738589213908
1900
+ },
1901
+ {
1902
+ "update": 1060,
1903
+ "global_step": 4341760,
1904
+ "num_episodes": 1461,
1905
+ "mean_reward": 39.593364839553836,
1906
+ "mean_length": 2808.07,
1907
+ "loss": 0.8465878963470459,
1908
+ "sps": 2184.9711988588206
1909
+ },
1910
+ {
1911
+ "update": 1065,
1912
+ "global_step": 4362240,
1913
+ "num_episodes": 1468,
1914
+ "mean_reward": 41.406082754135134,
1915
+ "mean_length": 2993.5,
1916
+ "loss": 2.0758755207061768,
1917
+ "sps": 756.2821057624998
1918
+ },
1919
+ {
1920
+ "update": 1070,
1921
+ "global_step": 4382720,
1922
+ "num_episodes": 1475,
1923
+ "mean_reward": 35.40320841789246,
1924
+ "mean_length": 2801.81,
1925
+ "loss": 9.254124641418457,
1926
+ "sps": 1308.8266389540704
1927
+ },
1928
+ {
1929
+ "update": 1075,
1930
+ "global_step": 4403200,
1931
+ "num_episodes": 1485,
1932
+ "mean_reward": 40.63698256492615,
1933
+ "mean_length": 2897.46,
1934
+ "loss": 0.5495067834854126,
1935
+ "sps": 799.4087672915922
1936
+ },
1937
+ {
1938
+ "update": 1080,
1939
+ "global_step": 4423680,
1940
+ "num_episodes": 1485,
1941
+ "mean_reward": 40.63698256492615,
1942
+ "mean_length": 2897.46,
1943
+ "loss": 0.5127925872802734,
1944
+ "sps": 1340.6154629856217
1945
+ },
1946
+ {
1947
+ "update": 1085,
1948
+ "global_step": 4444160,
1949
+ "num_episodes": 1486,
1950
+ "mean_reward": 41.25436541080475,
1951
+ "mean_length": 2996.57,
1952
+ "loss": 0.2694500684738159,
1953
+ "sps": 1837.4922707085245
1954
+ },
1955
+ {
1956
+ "update": 1090,
1957
+ "global_step": 4464640,
1958
+ "num_episodes": 1495,
1959
+ "mean_reward": 46.52874216079712,
1960
+ "mean_length": 3122.1,
1961
+ "loss": 10.304019927978516,
1962
+ "sps": 1046.1438942302716
1963
+ },
1964
+ {
1965
+ "update": 1095,
1966
+ "global_step": 4485120,
1967
+ "num_episodes": 1509,
1968
+ "mean_reward": 47.41412097454071,
1969
+ "mean_length": 3088.91,
1970
+ "loss": 4.403233051300049,
1971
+ "sps": 705.1397561929946
1972
+ },
1973
+ {
1974
+ "update": 1100,
1975
+ "global_step": 4505600,
1976
+ "num_episodes": 1509,
1977
+ "mean_reward": 47.41412097454071,
1978
+ "mean_length": 3088.91,
1979
+ "loss": 5.204225540161133,
1980
+ "sps": 956.5984516573334
1981
+ },
1982
+ {
1983
+ "update": 1105,
1984
+ "global_step": 4526080,
1985
+ "num_episodes": 1514,
1986
+ "mean_reward": 46.740014991760255,
1987
+ "mean_length": 3048.45,
1988
+ "loss": 1.3485753536224365,
1989
+ "sps": 2885.439969795025
1990
+ },
1991
+ {
1992
+ "update": 1110,
1993
+ "global_step": 4546560,
1994
+ "num_episodes": 1520,
1995
+ "mean_reward": 55.2595415019989,
1996
+ "mean_length": 3313.8,
1997
+ "loss": 2.248853921890259,
1998
+ "sps": 307.51773062645026
1999
+ },
2000
+ {
2001
+ "update": 1115,
2002
+ "global_step": 4567040,
2003
+ "num_episodes": 1535,
2004
+ "mean_reward": 45.75344041824341,
2005
+ "mean_length": 3241.81,
2006
+ "loss": 1.3050477504730225,
2007
+ "sps": 349.05706197896666
2008
+ },
2009
+ {
2010
+ "update": 1120,
2011
+ "global_step": 4587520,
2012
+ "num_episodes": 1538,
2013
+ "mean_reward": 48.896285104751584,
2014
+ "mean_length": 3154.78,
2015
+ "loss": 1.0057706832885742,
2016
+ "sps": 741.8276433683515
2017
+ },
2018
+ {
2019
+ "update": 1125,
2020
+ "global_step": 4608000,
2021
+ "num_episodes": 1543,
2022
+ "mean_reward": 49.45138314723968,
2023
+ "mean_length": 3312.76,
2024
+ "loss": 3.0420098304748535,
2025
+ "sps": 267.5648126417643
2026
+ },
2027
+ {
2028
+ "update": 1130,
2029
+ "global_step": 4628480,
2030
+ "num_episodes": 1554,
2031
+ "mean_reward": 45.64340697288513,
2032
+ "mean_length": 2967.91,
2033
+ "loss": 0.534530758857727,
2034
+ "sps": 595.2909007117904
2035
+ },
2036
+ {
2037
+ "update": 1135,
2038
+ "global_step": 4648960,
2039
+ "num_episodes": 1566,
2040
+ "mean_reward": 46.33692901611328,
2041
+ "mean_length": 3175.93,
2042
+ "loss": 12.374032974243164,
2043
+ "sps": 273.2022330683734
2044
+ },
2045
+ {
2046
+ "update": 1140,
2047
+ "global_step": 4669440,
2048
+ "num_episodes": 1574,
2049
+ "mean_reward": 45.48817714929581,
2050
+ "mean_length": 3007.35,
2051
+ "loss": 9.789308547973633,
2052
+ "sps": 468.42362264683504
2053
+ },
2054
+ {
2055
+ "update": 1145,
2056
+ "global_step": 4689920,
2057
+ "num_episodes": 1579,
2058
+ "mean_reward": 39.85454882383347,
2059
+ "mean_length": 2862.76,
2060
+ "loss": 14.043472290039062,
2061
+ "sps": 685.1921972366682
2062
+ },
2063
+ {
2064
+ "update": 1150,
2065
+ "global_step": 4710400,
2066
+ "num_episodes": 1592,
2067
+ "mean_reward": 33.700903475284576,
2068
+ "mean_length": 2480.53,
2069
+ "loss": 8.177810668945312,
2070
+ "sps": 333.4487379606718
2071
+ },
2072
+ {
2073
+ "update": 1155,
2074
+ "global_step": 4730880,
2075
+ "num_episodes": 1607,
2076
+ "mean_reward": 32.822717969417575,
2077
+ "mean_length": 2485.88,
2078
+ "loss": 58.6836051940918,
2079
+ "sps": 320.2091534142309
2080
+ },
2081
+ {
2082
+ "update": 1160,
2083
+ "global_step": 4751360,
2084
+ "num_episodes": 1613,
2085
+ "mean_reward": 32.850986263751984,
2086
+ "mean_length": 2471.11,
2087
+ "loss": 4.779659748077393,
2088
+ "sps": 886.848895232356
2089
+ },
2090
+ {
2091
+ "update": 1165,
2092
+ "global_step": 4771840,
2093
+ "num_episodes": 1616,
2094
+ "mean_reward": 30.757194340229034,
2095
+ "mean_length": 2379.02,
2096
+ "loss": 0.5000092387199402,
2097
+ "sps": 1201.649782869044
2098
+ },
2099
+ {
2100
+ "update": 1170,
2101
+ "global_step": 4792320,
2102
+ "num_episodes": 1618,
2103
+ "mean_reward": 31.608184831142424,
2104
+ "mean_length": 2481.05,
2105
+ "loss": 0.5737665891647339,
2106
+ "sps": 999.5698665674614
2107
+ },
2108
+ {
2109
+ "update": 1175,
2110
+ "global_step": 4812800,
2111
+ "num_episodes": 1623,
2112
+ "mean_reward": 32.650640833377835,
2113
+ "mean_length": 2658.87,
2114
+ "loss": 3.0942459106445312,
2115
+ "sps": 616.1316245054412
2116
+ },
2117
+ {
2118
+ "update": 1180,
2119
+ "global_step": 4833280,
2120
+ "num_episodes": 1625,
2121
+ "mean_reward": 37.96194513559342,
2122
+ "mean_length": 2682.02,
2123
+ "loss": 9.840081214904785,
2124
+ "sps": 698.9043221988104
2125
+ },
2126
+ {
2127
+ "update": 1185,
2128
+ "global_step": 4853760,
2129
+ "num_episodes": 1630,
2130
+ "mean_reward": 42.92790410280227,
2131
+ "mean_length": 2817.92,
2132
+ "loss": 1.0771808624267578,
2133
+ "sps": 677.5959966787289
2134
+ },
2135
+ {
2136
+ "update": 1190,
2137
+ "global_step": 4874240,
2138
+ "num_episodes": 1631,
2139
+ "mean_reward": 43.29853859186173,
2140
+ "mean_length": 2913.27,
2141
+ "loss": 0.15924929082393646,
2142
+ "sps": 2128.8886043740195
2143
+ },
2144
+ {
2145
+ "update": 1195,
2146
+ "global_step": 4894720,
2147
+ "num_episodes": 1635,
2148
+ "mean_reward": 44.563561956882474,
2149
+ "mean_length": 3101.73,
2150
+ "loss": 24.4128475189209,
2151
+ "sps": 768.3773024267982
2152
+ },
2153
+ {
2154
+ "update": 1200,
2155
+ "global_step": 4915200,
2156
+ "num_episodes": 1642,
2157
+ "mean_reward": 45.34813198566437,
2158
+ "mean_length": 3112.72,
2159
+ "loss": 8.374368667602539,
2160
+ "sps": 976.3571543939305
2161
+ },
2162
+ {
2163
+ "update": 1205,
2164
+ "global_step": 4935680,
2165
+ "num_episodes": 1649,
2166
+ "mean_reward": 46.91295046806336,
2167
+ "mean_length": 3360.16,
2168
+ "loss": 13.855979919433594,
2169
+ "sps": 264.6689562194928
2170
+ },
2171
+ {
2172
+ "update": 1210,
2173
+ "global_step": 4956160,
2174
+ "num_episodes": 1661,
2175
+ "mean_reward": 46.23843379974365,
2176
+ "mean_length": 3281.89,
2177
+ "loss": 95.5245590209961,
2178
+ "sps": 248.04865845009095
2179
+ },
2180
+ {
2181
+ "update": 1215,
2182
+ "global_step": 4976640,
2183
+ "num_episodes": 1669,
2184
+ "mean_reward": 49.53000379562378,
2185
+ "mean_length": 3152.88,
2186
+ "loss": 92.37181091308594,
2187
+ "sps": 241.45699781157901
2188
+ },
2189
+ {
2190
+ "update": 1220,
2191
+ "global_step": 4997120,
2192
+ "num_episodes": 1675,
2193
+ "mean_reward": 49.94117986917496,
2194
+ "mean_length": 3244.89,
2195
+ "loss": 36.136539459228516,
2196
+ "sps": 332.50957910682797
2197
+ },
2198
+ {
2199
+ "update": 1225,
2200
+ "global_step": 5017600,
2201
+ "num_episodes": 1681,
2202
+ "mean_reward": 49.826350367069246,
2203
+ "mean_length": 3322.41,
2204
+ "loss": 4.504380702972412,
2205
+ "sps": 317.38761005771676
2206
+ },
2207
+ {
2208
+ "update": 1230,
2209
+ "global_step": 5038080,
2210
+ "num_episodes": 1686,
2211
+ "mean_reward": 50.16007830858231,
2212
+ "mean_length": 3383.77,
2213
+ "loss": 11.077418327331543,
2214
+ "sps": 614.5178546338719
2215
+ },
2216
+ {
2217
+ "update": 1235,
2218
+ "global_step": 5058560,
2219
+ "num_episodes": 1689,
2220
+ "mean_reward": 50.77651453256607,
2221
+ "mean_length": 3495.66,
2222
+ "loss": 12.558845520019531,
2223
+ "sps": 574.6067562429008
2224
+ },
2225
+ {
2226
+ "update": 1240,
2227
+ "global_step": 5079040,
2228
+ "num_episodes": 1694,
2229
+ "mean_reward": 51.09682610750198,
2230
+ "mean_length": 3719.18,
2231
+ "loss": 21.6842098236084,
2232
+ "sps": 1195.0937330699562
2233
+ },
2234
+ {
2235
+ "update": 1245,
2236
+ "global_step": 5099520,
2237
+ "num_episodes": 1698,
2238
+ "mean_reward": 50.89644577741623,
2239
+ "mean_length": 3737.48,
2240
+ "loss": 65.59449768066406,
2241
+ "sps": 836.6498465651391
2242
+ },
2243
+ {
2244
+ "update": 1250,
2245
+ "global_step": 5120000,
2246
+ "num_episodes": 1701,
2247
+ "mean_reward": 51.103903777599335,
2248
+ "mean_length": 3865.3,
2249
+ "loss": 20.571155548095703,
2250
+ "sps": 835.7550802547468
2251
+ },
2252
+ {
2253
+ "update": 1255,
2254
+ "global_step": 5140480,
2255
+ "num_episodes": 1705,
2256
+ "mean_reward": 56.42267068624496,
2257
+ "mean_length": 4036.09,
2258
+ "loss": 65.08052825927734,
2259
+ "sps": 391.16626869711587
2260
+ },
2261
+ {
2262
+ "update": 1260,
2263
+ "global_step": 5160960,
2264
+ "num_episodes": 1709,
2265
+ "mean_reward": 57.39474864244461,
2266
+ "mean_length": 4054.32,
2267
+ "loss": 9.735504150390625,
2268
+ "sps": 279.94177596903813
2269
+ },
2270
+ {
2271
+ "update": 1265,
2272
+ "global_step": 5181440,
2273
+ "num_episodes": 1715,
2274
+ "mean_reward": 63.1245819735527,
2275
+ "mean_length": 4262.64,
2276
+ "loss": 473.1329040527344,
2277
+ "sps": 195.8880726253604
2278
+ },
2279
+ {
2280
+ "update": 1270,
2281
+ "global_step": 5201920,
2282
+ "num_episodes": 1719,
2283
+ "mean_reward": 56.77260554075241,
2284
+ "mean_length": 4121.23,
2285
+ "loss": 263.35589599609375,
2286
+ "sps": 350.0223876268303
2287
+ },
2288
+ {
2289
+ "update": 1275,
2290
+ "global_step": 5222400,
2291
+ "num_episodes": 1726,
2292
+ "mean_reward": 57.57951702833176,
2293
+ "mean_length": 3960.67,
2294
+ "loss": 59.782310485839844,
2295
+ "sps": 412.90096002367255
2296
+ },
2297
+ {
2298
+ "update": 1280,
2299
+ "global_step": 5242880,
2300
+ "num_episodes": 1736,
2301
+ "mean_reward": 56.08834368228912,
2302
+ "mean_length": 3644.63,
2303
+ "loss": 52.62528991699219,
2304
+ "sps": 343.349341248895
2305
+ },
2306
+ {
2307
+ "update": 1285,
2308
+ "global_step": 5263360,
2309
+ "num_episodes": 1737,
2310
+ "mean_reward": 55.905996737480166,
2311
+ "mean_length": 3640.62,
2312
+ "loss": 104.78556060791016,
2313
+ "sps": 397.8813187415849
2314
+ },
2315
+ {
2316
+ "update": 1290,
2317
+ "global_step": 5283840,
2318
+ "num_episodes": 1746,
2319
+ "mean_reward": 59.82821818828583,
2320
+ "mean_length": 3347.95,
2321
+ "loss": 935.470947265625,
2322
+ "sps": 355.2664513297179
2323
+ },
2324
+ {
2325
+ "update": 1295,
2326
+ "global_step": 5304320,
2327
+ "num_episodes": 1753,
2328
+ "mean_reward": 86.4339063167572,
2329
+ "mean_length": 3459.02,
2330
+ "loss": 795.3289794921875,
2331
+ "sps": 309.51331334453033
2332
+ },
2333
+ {
2334
+ "update": 1300,
2335
+ "global_step": 5324800,
2336
+ "num_episodes": 1762,
2337
+ "mean_reward": 306.1682329654694,
2338
+ "mean_length": 3702.99,
2339
+ "loss": 621.1657104492188,
2340
+ "sps": 312.8076116030221
2341
+ },
2342
+ {
2343
+ "update": 1305,
2344
+ "global_step": 5345280,
2345
+ "num_episodes": 1763,
2346
+ "mean_reward": 306.26519484996794,
2347
+ "mean_length": 3714.28,
2348
+ "loss": 445.75,
2349
+ "sps": 390.84962605893924
2350
+ },
2351
+ {
2352
+ "update": 1310,
2353
+ "global_step": 5365760,
2354
+ "num_episodes": 1772,
2355
+ "mean_reward": 357.3265011548996,
2356
+ "mean_length": 3776.52,
2357
+ "loss": 1374.097412109375,
2358
+ "sps": 385.8458185201923
2359
+ },
2360
+ {
2361
+ "update": 1315,
2362
+ "global_step": 5386240,
2363
+ "num_episodes": 1773,
2364
+ "mean_reward": 357.70143792629244,
2365
+ "mean_length": 3875.1,
2366
+ "loss": 208.59083557128906,
2367
+ "sps": 555.0694538042676
2368
+ },
2369
+ {
2370
+ "update": 1320,
2371
+ "global_step": 5406720,
2372
+ "num_episodes": 1786,
2373
+ "mean_reward": 362.7652148580551,
2374
+ "mean_length": 3706.89,
2375
+ "loss": 1506.414306640625,
2376
+ "sps": 246.7002255816469
2377
+ },
2378
+ {
2379
+ "update": 1325,
2380
+ "global_step": 5427200,
2381
+ "num_episodes": 1786,
2382
+ "mean_reward": 362.7652148580551,
2383
+ "mean_length": 3706.89,
2384
+ "loss": 46.25688934326172,
2385
+ "sps": 409.5117865802824
2386
+ },
2387
+ {
2388
+ "update": 1330,
2389
+ "global_step": 5447680,
2390
+ "num_episodes": 1791,
2391
+ "mean_reward": 501.35476023197174,
2392
+ "mean_length": 3744.98,
2393
+ "loss": 371.16973876953125,
2394
+ "sps": 278.5499936271881
2395
+ },
2396
+ {
2397
+ "update": 1335,
2398
+ "global_step": 5468160,
2399
+ "num_episodes": 1793,
2400
+ "mean_reward": 607.4379647493363,
2401
+ "mean_length": 3921.02,
2402
+ "loss": 73.33538818359375,
2403
+ "sps": 388.619707385681
2404
+ },
2405
+ {
2406
+ "update": 1340,
2407
+ "global_step": 5488640,
2408
+ "num_episodes": 1796,
2409
+ "mean_reward": 756.3367283582687,
2410
+ "mean_length": 4020.31,
2411
+ "loss": 462.8284606933594,
2412
+ "sps": 342.5094025627594
2413
+ },
2414
+ {
2415
+ "update": 1345,
2416
+ "global_step": 5509120,
2417
+ "num_episodes": 1796,
2418
+ "mean_reward": 756.3367283582687,
2419
+ "mean_length": 4020.31,
2420
+ "loss": 67.39126586914062,
2421
+ "sps": 385.48904917796193
2422
+ },
2423
+ {
2424
+ "update": 1350,
2425
+ "global_step": 5529600,
2426
+ "num_episodes": 1802,
2427
+ "mean_reward": 869.4895800352097,
2428
+ "mean_length": 3967.21,
2429
+ "loss": 12.0632905960083,
2430
+ "sps": 668.8793003173347
2431
+ },
2432
+ {
2433
+ "update": 1355,
2434
+ "global_step": 5550080,
2435
+ "num_episodes": 1804,
2436
+ "mean_reward": 989.9085026121139,
2437
+ "mean_length": 4063.23,
2438
+ "loss": 86.83934783935547,
2439
+ "sps": 418.3353031686608
2440
+ },
2441
+ {
2442
+ "update": 1360,
2443
+ "global_step": 5570560,
2444
+ "num_episodes": 1815,
2445
+ "mean_reward": 1078.798788728714,
2446
+ "mean_length": 3886.21,
2447
+ "loss": 666.8345336914062,
2448
+ "sps": 256.51104316818095
2449
+ },
2450
+ {
2451
+ "update": 1365,
2452
+ "global_step": 5591040,
2453
+ "num_episodes": 1815,
2454
+ "mean_reward": 1078.798788728714,
2455
+ "mean_length": 3886.21,
2456
+ "loss": 900.6356811523438,
2457
+ "sps": 495.47242571130397
2458
+ },
2459
+ {
2460
+ "update": 1370,
2461
+ "global_step": 5611520,
2462
+ "num_episodes": 1824,
2463
+ "mean_reward": 1200.6963546466827,
2464
+ "mean_length": 4027.44,
2465
+ "loss": 20.821474075317383,
2466
+ "sps": 506.7840907522114
2467
+ },
2468
+ {
2469
+ "update": 1375,
2470
+ "global_step": 5632000,
2471
+ "num_episodes": 1826,
2472
+ "mean_reward": 1261.5812614822387,
2473
+ "mean_length": 4015.04,
2474
+ "loss": 4563.7763671875,
2475
+ "sps": 675.537157736375
2476
+ },
2477
+ {
2478
+ "update": 1380,
2479
+ "global_step": 5652480,
2480
+ "num_episodes": 1836,
2481
+ "mean_reward": 1445.384720082283,
2482
+ "mean_length": 4027.89,
2483
+ "loss": 459.9952697753906,
2484
+ "sps": 341.6139254031597
2485
+ },
2486
+ {
2487
+ "update": 1385,
2488
+ "global_step": 5672960,
2489
+ "num_episodes": 1839,
2490
+ "mean_reward": 1445.7170259952545,
2491
+ "mean_length": 4039.25,
2492
+ "loss": 1.4308533668518066,
2493
+ "sps": 1488.9461962855103
2494
+ },
2495
+ {
2496
+ "update": 1390,
2497
+ "global_step": 5693440,
2498
+ "num_episodes": 1856,
2499
+ "mean_reward": 1429.1698562908173,
2500
+ "mean_length": 3901.28,
2501
+ "loss": 13.810539245605469,
2502
+ "sps": 583.7912153704766
2503
+ },
2504
+ {
2505
+ "update": 1395,
2506
+ "global_step": 5713920,
2507
+ "num_episodes": 1863,
2508
+ "mean_reward": 1209.2014095544814,
2509
+ "mean_length": 3639.94,
2510
+ "loss": 273.29278564453125,
2511
+ "sps": 645.3489718832685
2512
+ },
2513
+ {
2514
+ "update": 1400,
2515
+ "global_step": 5734400,
2516
+ "num_episodes": 1871,
2517
+ "mean_reward": 1319.8262105464935,
2518
+ "mean_length": 3745.57,
2519
+ "loss": 16.14458656311035,
2520
+ "sps": 324.31654457005965
2521
+ },
2522
+ {
2523
+ "update": 1405,
2524
+ "global_step": 5754880,
2525
+ "num_episodes": 1887,
2526
+ "mean_reward": 1180.7784528923034,
2527
+ "mean_length": 3295.57,
2528
+ "loss": 3.5345799922943115,
2529
+ "sps": 635.0555390978727
2530
+ },
2531
+ {
2532
+ "update": 1410,
2533
+ "global_step": 5775360,
2534
+ "num_episodes": 1897,
2535
+ "mean_reward": 920.7941575908661,
2536
+ "mean_length": 2829.74,
2537
+ "loss": 29.106164932250977,
2538
+ "sps": 283.5497895518039
2539
+ },
2540
+ {
2541
+ "update": 1415,
2542
+ "global_step": 5795840,
2543
+ "num_episodes": 1899,
2544
+ "mean_reward": 1015.2013060569764,
2545
+ "mean_length": 2877.24,
2546
+ "loss": 134.17953491210938,
2547
+ "sps": 514.4558919437959
2548
+ },
2549
+ {
2550
+ "update": 1420,
2551
+ "global_step": 5816320,
2552
+ "num_episodes": 1905,
2553
+ "mean_reward": 759.3930412435532,
2554
+ "mean_length": 2493.3,
2555
+ "loss": 71.0991439819336,
2556
+ "sps": 407.36350916126725
2557
+ },
2558
+ {
2559
+ "update": 1425,
2560
+ "global_step": 5836800,
2561
+ "num_episodes": 1923,
2562
+ "mean_reward": 591.994899687767,
2563
+ "mean_length": 2306.11,
2564
+ "loss": 665.6765747070312,
2565
+ "sps": 292.25754672826326
2566
+ },
2567
+ {
2568
+ "update": 1430,
2569
+ "global_step": 5857280,
2570
+ "num_episodes": 1926,
2571
+ "mean_reward": 491.9243581056595,
2572
+ "mean_length": 2308.91,
2573
+ "loss": 97.95196533203125,
2574
+ "sps": 398.43841528471495
2575
+ },
2576
+ {
2577
+ "update": 1435,
2578
+ "global_step": 5877760,
2579
+ "num_episodes": 1934,
2580
+ "mean_reward": 396.85663065433505,
2581
+ "mean_length": 2257.74,
2582
+ "loss": 1123.9271240234375,
2583
+ "sps": 467.5370774012981
2584
+ },
2585
+ {
2586
+ "update": 1440,
2587
+ "global_step": 5898240,
2588
+ "num_episodes": 1939,
2589
+ "mean_reward": 444.68491759777066,
2590
+ "mean_length": 2252.07,
2591
+ "loss": 149.9423065185547,
2592
+ "sps": 397.96294200167154
2593
+ },
2594
+ {
2595
+ "update": 1445,
2596
+ "global_step": 5918720,
2597
+ "num_episodes": 1952,
2598
+ "mean_reward": 490.6093361234665,
2599
+ "mean_length": 2347.41,
2600
+ "loss": 188.90318298339844,
2601
+ "sps": 433.4020614000179
2602
+ },
2603
+ {
2604
+ "update": 1450,
2605
+ "global_step": 5939200,
2606
+ "num_episodes": 1968,
2607
+ "mean_reward": 341.27103984832763,
2608
+ "mean_length": 2088.38,
2609
+ "loss": 49.23589324951172,
2610
+ "sps": 192.3194868778508
2611
+ },
2612
+ {
2613
+ "update": 1455,
2614
+ "global_step": 5959680,
2615
+ "num_episodes": 1980,
2616
+ "mean_reward": 340.5469175004959,
2617
+ "mean_length": 2102.8,
2618
+ "loss": 53.03819274902344,
2619
+ "sps": 401.45828106312945
2620
+ },
2621
+ {
2622
+ "update": 1460,
2623
+ "global_step": 5980160,
2624
+ "num_episodes": 1987,
2625
+ "mean_reward": 341.23766746520994,
2626
+ "mean_length": 2290.56,
2627
+ "loss": 12.17508316040039,
2628
+ "sps": 321.48619942614465
2629
+ },
2630
+ {
2631
+ "update": 1465,
2632
+ "global_step": 6000640,
2633
+ "num_episodes": 1993,
2634
+ "mean_reward": 340.6004219532013,
2635
+ "mean_length": 2289.4,
2636
+ "loss": 4.770975589752197,
2637
+ "sps": 400.6137968711218
2638
+ },
2639
+ {
2640
+ "update": 1470,
2641
+ "global_step": 6021120,
2642
+ "num_episodes": 2004,
2643
+ "mean_reward": 321.17223945140836,
2644
+ "mean_length": 2148.56,
2645
+ "loss": 29.761117935180664,
2646
+ "sps": 472.58742855602395
2647
+ },
2648
+ {
2649
+ "update": 1475,
2650
+ "global_step": 6041600,
2651
+ "num_episodes": 2026,
2652
+ "mean_reward": 336.3226027822495,
2653
+ "mean_length": 1952.34,
2654
+ "loss": 203.31748962402344,
2655
+ "sps": 357.29518794795473
2656
+ },
2657
+ {
2658
+ "update": 1480,
2659
+ "global_step": 6062080,
2660
+ "num_episodes": 2033,
2661
+ "mean_reward": 313.32783755540845,
2662
+ "mean_length": 1958.32,
2663
+ "loss": 508.1851501464844,
2664
+ "sps": 510.89708007010705
2665
+ },
2666
+ {
2667
+ "update": 1485,
2668
+ "global_step": 6082560,
2669
+ "num_episodes": 2046,
2670
+ "mean_reward": 194.68075824975966,
2671
+ "mean_length": 1639.76,
2672
+ "loss": 7.842682361602783,
2673
+ "sps": 546.9033062498472
2674
+ },
2675
+ {
2676
+ "update": 1490,
2677
+ "global_step": 6103040,
2678
+ "num_episodes": 2050,
2679
+ "mean_reward": 143.90823694944382,
2680
+ "mean_length": 1738.66,
2681
+ "loss": 46.426143646240234,
2682
+ "sps": 306.85539615127885
2683
+ },
2684
+ {
2685
+ "update": 1495,
2686
+ "global_step": 6123520,
2687
+ "num_episodes": 2060,
2688
+ "mean_reward": 334.49683523893356,
2689
+ "mean_length": 1898.53,
2690
+ "loss": 468.38525390625,
2691
+ "sps": 245.82053873712613
2692
+ },
2693
+ {
2694
+ "update": 1500,
2695
+ "global_step": 6144000,
2696
+ "num_episodes": 2069,
2697
+ "mean_reward": 334.39896435022354,
2698
+ "mean_length": 2044.37,
2699
+ "loss": 85.75762939453125,
2700
+ "sps": 334.1080989119973
2701
+ },
2702
+ {
2703
+ "update": 1505,
2704
+ "global_step": 6164480,
2705
+ "num_episodes": 2074,
2706
+ "mean_reward": 333.67030994653703,
2707
+ "mean_length": 1990.3,
2708
+ "loss": 295.0758361816406,
2709
+ "sps": 345.61576592839117
2710
+ },
2711
+ {
2712
+ "update": 1510,
2713
+ "global_step": 6184960,
2714
+ "num_episodes": 2080,
2715
+ "mean_reward": 334.66018194913863,
2716
+ "mean_length": 2150.39,
2717
+ "loss": 10.077195167541504,
2718
+ "sps": 420.2391860174057
2719
+ },
2720
+ {
2721
+ "update": 1515,
2722
+ "global_step": 6205440,
2723
+ "num_episodes": 2097,
2724
+ "mean_reward": 258.6993135094643,
2725
+ "mean_length": 2109.74,
2726
+ "loss": 122.61629486083984,
2727
+ "sps": 229.56375224355847
2728
+ },
2729
+ {
2730
+ "update": 1520,
2731
+ "global_step": 6225920,
2732
+ "num_episodes": 2107,
2733
+ "mean_reward": 258.4274232983589,
2734
+ "mean_length": 2108.42,
2735
+ "loss": 361.1533203125,
2736
+ "sps": 361.8525022786034
2737
+ },
2738
+ {
2739
+ "update": 1525,
2740
+ "global_step": 6246400,
2741
+ "num_episodes": 2125,
2742
+ "mean_reward": 239.45979365587235,
2743
+ "mean_length": 1883.93,
2744
+ "loss": 25.989086151123047,
2745
+ "sps": 275.38397015212564
2746
+ },
2747
+ {
2748
+ "update": 1530,
2749
+ "global_step": 6266880,
2750
+ "num_episodes": 2131,
2751
+ "mean_reward": 239.8594556736946,
2752
+ "mean_length": 1981.69,
2753
+ "loss": 404.8509521484375,
2754
+ "sps": 405.6766540706019
2755
+ },
2756
+ {
2757
+ "update": 1535,
2758
+ "global_step": 6287360,
2759
+ "num_episodes": 2146,
2760
+ "mean_reward": 303.8745777082443,
2761
+ "mean_length": 2254.68,
2762
+ "loss": 1170.5218505859375,
2763
+ "sps": 251.0949486970236
2764
+ },
2765
+ {
2766
+ "update": 1540,
2767
+ "global_step": 6307840,
2768
+ "num_episodes": 2147,
2769
+ "mean_reward": 304.0846936607361,
2770
+ "mean_length": 2341.33,
2771
+ "loss": 8.5966157913208,
2772
+ "sps": 446.2390266117631
2773
+ },
2774
+ {
2775
+ "update": 1545,
2776
+ "global_step": 6328320,
2777
+ "num_episodes": 2154,
2778
+ "mean_reward": 437.3326745700836,
2779
+ "mean_length": 2280.45,
2780
+ "loss": 1299.16552734375,
2781
+ "sps": 259.202559576872
2782
+ },
2783
+ {
2784
+ "update": 1550,
2785
+ "global_step": 6348800,
2786
+ "num_episodes": 2158,
2787
+ "mean_reward": 303.4427977991104,
2788
+ "mean_length": 2211.31,
2789
+ "loss": 1.1450996398925781,
2790
+ "sps": 549.5109518078744
2791
+ },
2792
+ {
2793
+ "update": 1555,
2794
+ "global_step": 6369280,
2795
+ "num_episodes": 2172,
2796
+ "mean_reward": 304.4946458005905,
2797
+ "mean_length": 2210.89,
2798
+ "loss": 59.038814544677734,
2799
+ "sps": 341.25380949315337
2800
+ },
2801
+ {
2802
+ "update": 1560,
2803
+ "global_step": 6389760,
2804
+ "num_episodes": 2175,
2805
+ "mean_reward": 304.3582187414169,
2806
+ "mean_length": 2211.87,
2807
+ "loss": 13.61323070526123,
2808
+ "sps": 642.8393173099237
2809
+ },
2810
+ {
2811
+ "update": 1565,
2812
+ "global_step": 6410240,
2813
+ "num_episodes": 2180,
2814
+ "mean_reward": 305.09624064445495,
2815
+ "mean_length": 2307.71,
2816
+ "loss": 1913.05126953125,
2817
+ "sps": 391.8241680446758
2818
+ },
2819
+ {
2820
+ "update": 1570,
2821
+ "global_step": 6430720,
2822
+ "num_episodes": 2183,
2823
+ "mean_reward": 305.77074447154996,
2824
+ "mean_length": 2405.35,
2825
+ "loss": 136.88858032226562,
2826
+ "sps": 394.7672396028363
2827
+ },
2828
+ {
2829
+ "update": 1575,
2830
+ "global_step": 6451200,
2831
+ "num_episodes": 2189,
2832
+ "mean_reward": 405.41952825069427,
2833
+ "mean_length": 2593.85,
2834
+ "loss": 83.12802124023438,
2835
+ "sps": 299.9621165564932
2836
+ },
2837
+ {
2838
+ "update": 1580,
2839
+ "global_step": 6471680,
2840
+ "num_episodes": 2207,
2841
+ "mean_reward": 405.9634767389297,
2842
+ "mean_length": 2524.02,
2843
+ "loss": 13081.1357421875,
2844
+ "sps": 223.53810669328672
2845
+ },
2846
+ {
2847
+ "update": 1585,
2848
+ "global_step": 6492160,
2849
+ "num_episodes": 2211,
2850
+ "mean_reward": 499.87862377643586,
2851
+ "mean_length": 2495.58,
2852
+ "loss": 46.772918701171875,
2853
+ "sps": 453.36473969791376
2854
+ },
2855
+ {
2856
+ "update": 1590,
2857
+ "global_step": 6512640,
2858
+ "num_episodes": 2221,
2859
+ "mean_reward": 670.4232928800583,
2860
+ "mean_length": 2706.97,
2861
+ "loss": 368.63763427734375,
2862
+ "sps": 429.39001544641116
2863
+ },
2864
+ {
2865
+ "update": 1595,
2866
+ "global_step": 6533120,
2867
+ "num_episodes": 2227,
2868
+ "mean_reward": 803.7308926296234,
2869
+ "mean_length": 2725.72,
2870
+ "loss": 90.26509094238281,
2871
+ "sps": 413.30646156887036
2872
+ },
2873
+ {
2874
+ "update": 1600,
2875
+ "global_step": 6553600,
2876
+ "num_episodes": 2232,
2877
+ "mean_reward": 933.0243373584748,
2878
+ "mean_length": 2915.95,
2879
+ "loss": 68.28546142578125,
2880
+ "sps": 374.56623534825485
2881
+ },
2882
+ {
2883
+ "update": 1605,
2884
+ "global_step": 6574080,
2885
+ "num_episodes": 2235,
2886
+ "mean_reward": 933.269734544754,
2887
+ "mean_length": 2912.74,
2888
+ "loss": 865.8887939453125,
2889
+ "sps": 445.657946827829
2890
+ },
2891
+ {
2892
+ "update": 1610,
2893
+ "global_step": 6594560,
2894
+ "num_episodes": 2237,
2895
+ "mean_reward": 933.7870253133774,
2896
+ "mean_length": 3011.88,
2897
+ "loss": 871.9052734375,
2898
+ "sps": 387.41989720305287
2899
+ },
2900
+ {
2901
+ "update": 1615,
2902
+ "global_step": 6615040,
2903
+ "num_episodes": 2248,
2904
+ "mean_reward": 1077.8849126005173,
2905
+ "mean_length": 3056.53,
2906
+ "loss": 54.994510650634766,
2907
+ "sps": 423.33621401282124
2908
+ },
2909
+ {
2910
+ "update": 1620,
2911
+ "global_step": 6635520,
2912
+ "num_episodes": 2258,
2913
+ "mean_reward": 943.9673701143265,
2914
+ "mean_length": 3023.43,
2915
+ "loss": 137.88092041015625,
2916
+ "sps": 374.57440205325895
2917
+ },
2918
+ {
2919
+ "update": 1625,
2920
+ "global_step": 6656000,
2921
+ "num_episodes": 2263,
2922
+ "mean_reward": 856.1039279556275,
2923
+ "mean_length": 2924.57,
2924
+ "loss": 348.59234619140625,
2925
+ "sps": 377.6081162361363
2926
+ },
2927
+ {
2928
+ "update": 1630,
2929
+ "global_step": 6676480,
2930
+ "num_episodes": 2267,
2931
+ "mean_reward": 931.3061418437958,
2932
+ "mean_length": 2926.62,
2933
+ "loss": 9.557199478149414,
2934
+ "sps": 460.1107258733158
2935
+ },
2936
+ {
2937
+ "update": 1635,
2938
+ "global_step": 6696960,
2939
+ "num_episodes": 2273,
2940
+ "mean_reward": 931.8358981752395,
2941
+ "mean_length": 3120.06,
2942
+ "loss": 514.8411254882812,
2943
+ "sps": 523.5469529045816
2944
+ },
2945
+ {
2946
+ "update": 1640,
2947
+ "global_step": 6717440,
2948
+ "num_episodes": 2276,
2949
+ "mean_reward": 1109.3497059488298,
2950
+ "mean_length": 3219.09,
2951
+ "loss": 31.73206329345703,
2952
+ "sps": 412.7885853043598
2953
+ },
2954
+ {
2955
+ "update": 1645,
2956
+ "global_step": 6737920,
2957
+ "num_episodes": 2282,
2958
+ "mean_reward": 1197.9563966131211,
2959
+ "mean_length": 3220.5,
2960
+ "loss": 35.86986541748047,
2961
+ "sps": 466.28907053794126
2962
+ },
2963
+ {
2964
+ "update": 1650,
2965
+ "global_step": 6758400,
2966
+ "num_episodes": 2285,
2967
+ "mean_reward": 1119.968164858818,
2968
+ "mean_length": 3067.54,
2969
+ "loss": 279.3809509277344,
2970
+ "sps": 415.9298164658184
2971
+ },
2972
+ {
2973
+ "update": 1655,
2974
+ "global_step": 6778880,
2975
+ "num_episodes": 2299,
2976
+ "mean_reward": 1277.0260197210312,
2977
+ "mean_length": 3026.13,
2978
+ "loss": 44.909996032714844,
2979
+ "sps": 289.10855632711406
2980
+ },
2981
+ {
2982
+ "update": 1660,
2983
+ "global_step": 6799360,
2984
+ "num_episodes": 2310,
2985
+ "mean_reward": 1212.9116531801224,
2986
+ "mean_length": 3157.69,
2987
+ "loss": 19.89190673828125,
2988
+ "sps": 442.19954156582503
2989
+ },
2990
+ {
2991
+ "update": 1665,
2992
+ "global_step": 6819840,
2993
+ "num_episodes": 2313,
2994
+ "mean_reward": 1237.1113989305495,
2995
+ "mean_length": 3267.66,
2996
+ "loss": 49.466552734375,
2997
+ "sps": 941.6208281097553
2998
+ },
2999
+ {
3000
+ "update": 1670,
3001
+ "global_step": 6840320,
3002
+ "num_episodes": 2320,
3003
+ "mean_reward": 1185.366361656189,
3004
+ "mean_length": 3170.05,
3005
+ "loss": 226.9459686279297,
3006
+ "sps": 560.8755073464478
3007
+ },
3008
+ {
3009
+ "update": 1675,
3010
+ "global_step": 6860800,
3011
+ "num_episodes": 2327,
3012
+ "mean_reward": 1171.2066772079468,
3013
+ "mean_length": 3355.09,
3014
+ "loss": 10.15625,
3015
+ "sps": 335.8002299447698
3016
+ },
3017
+ {
3018
+ "update": 1680,
3019
+ "global_step": 6881280,
3020
+ "num_episodes": 2333,
3021
+ "mean_reward": 1042.0874049663544,
3022
+ "mean_length": 3211.85,
3023
+ "loss": 82.60301971435547,
3024
+ "sps": 302.2235800221304
3025
+ },
3026
+ {
3027
+ "update": 1685,
3028
+ "global_step": 6901760,
3029
+ "num_episodes": 2339,
3030
+ "mean_reward": 972.5269057083129,
3031
+ "mean_length": 3193.44,
3032
+ "loss": 120.32151794433594,
3033
+ "sps": 664.7036679347333
3034
+ },
3035
+ {
3036
+ "update": 1690,
3037
+ "global_step": 6922240,
3038
+ "num_episodes": 2339,
3039
+ "mean_reward": 972.5269057083129,
3040
+ "mean_length": 3193.44,
3041
+ "loss": 1.7924470901489258,
3042
+ "sps": 1063.7604928109242
3043
+ },
3044
+ {
3045
+ "update": 1695,
3046
+ "global_step": 6942720,
3047
+ "num_episodes": 2342,
3048
+ "mean_reward": 942.9001926231384,
3049
+ "mean_length": 3342.44,
3050
+ "loss": 6.127931594848633,
3051
+ "sps": 592.3562356846871
3052
+ },
3053
+ {
3054
+ "update": 1700,
3055
+ "global_step": 6963200,
3056
+ "num_episodes": 2345,
3057
+ "mean_reward": 943.6058731651306,
3058
+ "mean_length": 3474.03,
3059
+ "loss": 44.08333206176758,
3060
+ "sps": 297.44937442353597
3061
+ },
3062
+ {
3063
+ "update": 1705,
3064
+ "global_step": 6983680,
3065
+ "num_episodes": 2350,
3066
+ "mean_reward": 891.2378608131409,
3067
+ "mean_length": 3613.22,
3068
+ "loss": 139.50340270996094,
3069
+ "sps": 321.276001001162
3070
+ },
3071
+ {
3072
+ "update": 1710,
3073
+ "global_step": 7004160,
3074
+ "num_episodes": 2351,
3075
+ "mean_reward": 891.2994220161438,
3076
+ "mean_length": 3670.58,
3077
+ "loss": 80.7029800415039,
3078
+ "sps": 602.1302177845877
3079
+ },
3080
+ {
3081
+ "update": 1715,
3082
+ "global_step": 7024640,
3083
+ "num_episodes": 2354,
3084
+ "mean_reward": 891.3987969732284,
3085
+ "mean_length": 3707.87,
3086
+ "loss": 43.55419921875,
3087
+ "sps": 609.1286236648131
3088
+ },
3089
+ {
3090
+ "update": 1720,
3091
+ "global_step": 7045120,
3092
+ "num_episodes": 2356,
3093
+ "mean_reward": 908.0675931024551,
3094
+ "mean_length": 3894.63,
3095
+ "loss": 1924.8736572265625,
3096
+ "sps": 521.5708115336275
3097
+ },
3098
+ {
3099
+ "update": 1725,
3100
+ "global_step": 7065600,
3101
+ "num_episodes": 2368,
3102
+ "mean_reward": 877.4067792797089,
3103
+ "mean_length": 4024.38,
3104
+ "loss": 1546.024658203125,
3105
+ "sps": 144.1450251623922
3106
+ },
3107
+ {
3108
+ "update": 1730,
3109
+ "global_step": 7086080,
3110
+ "num_episodes": 2371,
3111
+ "mean_reward": 876.8285467529297,
3112
+ "mean_length": 4001.29,
3113
+ "loss": 140.04344177246094,
3114
+ "sps": 219.3887746950412
3115
+ },
3116
+ {
3117
+ "update": 1735,
3118
+ "global_step": 7106560,
3119
+ "num_episodes": 2379,
3120
+ "mean_reward": 712.0810299873352,
3121
+ "mean_length": 3765.17,
3122
+ "loss": 592.1142578125,
3123
+ "sps": 154.4538739502226
3124
+ },
3125
+ {
3126
+ "update": 1740,
3127
+ "global_step": 7127040,
3128
+ "num_episodes": 2380,
3129
+ "mean_reward": 711.973635725975,
3130
+ "mean_length": 3803.94,
3131
+ "loss": 94.82659149169922,
3132
+ "sps": 312.59798550353105
3133
+ },
3134
+ {
3135
+ "update": 1745,
3136
+ "global_step": 7147520,
3137
+ "num_episodes": 2386,
3138
+ "mean_reward": 721.5674746751786,
3139
+ "mean_length": 4056.06,
3140
+ "loss": 62.10371398925781,
3141
+ "sps": 320.26559324554347
3142
+ },
3143
+ {
3144
+ "update": 1750,
3145
+ "global_step": 7168000,
3146
+ "num_episodes": 2388,
3147
+ "mean_reward": 722.019714179039,
3148
+ "mean_length": 4119.0,
3149
+ "loss": 154.1345977783203,
3150
+ "sps": 490.86442866974096
3151
+ },
3152
+ {
3153
+ "update": 1755,
3154
+ "global_step": 7188480,
3155
+ "num_episodes": 2393,
3156
+ "mean_reward": 641.6925968694687,
3157
+ "mean_length": 4085.19,
3158
+ "loss": 3164.498779296875,
3159
+ "sps": 258.3838207181636
3160
+ },
3161
+ {
3162
+ "update": 1760,
3163
+ "global_step": 7208960,
3164
+ "num_episodes": 2395,
3165
+ "mean_reward": 686.4729778194428,
3166
+ "mean_length": 4253.36,
3167
+ "loss": 425.0414733886719,
3168
+ "sps": 362.7607269459368
3169
+ },
3170
+ {
3171
+ "update": 1765,
3172
+ "global_step": 7229440,
3173
+ "num_episodes": 2398,
3174
+ "mean_reward": 781.3392351341248,
3175
+ "mean_length": 4537.14,
3176
+ "loss": 31.12351417541504,
3177
+ "sps": 385.67594003613885
3178
+ },
3179
+ {
3180
+ "update": 1770,
3181
+ "global_step": 7249920,
3182
+ "num_episodes": 2401,
3183
+ "mean_reward": 780.9093813705445,
3184
+ "mean_length": 4438.34,
3185
+ "loss": 154.23956298828125,
3186
+ "sps": 350.9404403984712
3187
+ },
3188
+ {
3189
+ "update": 1775,
3190
+ "global_step": 7270400,
3191
+ "num_episodes": 2409,
3192
+ "mean_reward": 920.4024870014191,
3193
+ "mean_length": 4671.4,
3194
+ "loss": 68.82001495361328,
3195
+ "sps": 373.0457413366687
3196
+ },
3197
+ {
3198
+ "update": 1780,
3199
+ "global_step": 7290880,
3200
+ "num_episodes": 2431,
3201
+ "mean_reward": 661.6825140190125,
3202
+ "mean_length": 4178.67,
3203
+ "loss": 1634.844482421875,
3204
+ "sps": 334.82655519105265
3205
+ },
3206
+ {
3207
+ "update": 1785,
3208
+ "global_step": 7311360,
3209
+ "num_episodes": 2436,
3210
+ "mean_reward": 703.8663965082169,
3211
+ "mean_length": 4154.8,
3212
+ "loss": 36.222721099853516,
3213
+ "sps": 690.9562224821975
3214
+ },
3215
+ {
3216
+ "update": 1790,
3217
+ "global_step": 7331840,
3218
+ "num_episodes": 2442,
3219
+ "mean_reward": 682.8625202751159,
3220
+ "mean_length": 3792.16,
3221
+ "loss": 7.088906288146973,
3222
+ "sps": 546.4602476565249
3223
+ },
3224
+ {
3225
+ "update": 1795,
3226
+ "global_step": 7352320,
3227
+ "num_episodes": 2452,
3228
+ "mean_reward": 668.2820160484314,
3229
+ "mean_length": 3464.88,
3230
+ "loss": 15.93359375,
3231
+ "sps": 338.3906013362985
3232
+ },
3233
+ {
3234
+ "update": 1800,
3235
+ "global_step": 7372800,
3236
+ "num_episodes": 2479,
3237
+ "mean_reward": 503.18441142082213,
3238
+ "mean_length": 2745.39,
3239
+ "loss": 4.317780494689941,
3240
+ "sps": 247.07750153960802
3241
+ },
3242
+ {
3243
+ "update": 1805,
3244
+ "global_step": 7393280,
3245
+ "num_episodes": 2499,
3246
+ "mean_reward": 254.19922243118287,
3247
+ "mean_length": 1727.29,
3248
+ "loss": 2.4029455184936523,
3249
+ "sps": 343.1557628026188
3250
+ },
3251
+ {
3252
+ "update": 1810,
3253
+ "global_step": 7413760,
3254
+ "num_episodes": 2505,
3255
+ "mean_reward": 156.29735308647156,
3256
+ "mean_length": 1608.82,
3257
+ "loss": 50.26245880126953,
3258
+ "sps": 545.8711155437793
3259
+ },
3260
+ {
3261
+ "update": 1815,
3262
+ "global_step": 7434240,
3263
+ "num_episodes": 2511,
3264
+ "mean_reward": 86.22271653652192,
3265
+ "mean_length": 1490.02,
3266
+ "loss": 0.49680042266845703,
3267
+ "sps": 765.2176779813885
3268
+ },
3269
+ {
3270
+ "update": 1820,
3271
+ "global_step": 7454720,
3272
+ "num_episodes": 2517,
3273
+ "mean_reward": 87.76825302124024,
3274
+ "mean_length": 1667.16,
3275
+ "loss": 0.9337818622589111,
3276
+ "sps": 741.1448982958542
3277
+ },
3278
+ {
3279
+ "update": 1825,
3280
+ "global_step": 7475200,
3281
+ "num_episodes": 2525,
3282
+ "mean_reward": 83.8934311246872,
3283
+ "mean_length": 1771.63,
3284
+ "loss": 1.8736273050308228,
3285
+ "sps": 346.82608481484516
3286
+ },
3287
+ {
3288
+ "update": 1830,
3289
+ "global_step": 7495680,
3290
+ "num_episodes": 2536,
3291
+ "mean_reward": 42.52432149887085,
3292
+ "mean_length": 1779.55,
3293
+ "loss": 43.508872985839844,
3294
+ "sps": 538.257412633311
3295
+ },
3296
+ {
3297
+ "update": 1835,
3298
+ "global_step": 7516160,
3299
+ "num_episodes": 2559,
3300
+ "mean_reward": 22.3750723361969,
3301
+ "mean_length": 1585.34,
3302
+ "loss": 11.682701110839844,
3303
+ "sps": 283.17378891804617
3304
+ },
3305
+ {
3306
+ "update": 1840,
3307
+ "global_step": 7536640,
3308
+ "num_episodes": 2572,
3309
+ "mean_reward": 23.224842309951782,
3310
+ "mean_length": 1714.64,
3311
+ "loss": 0.64460289478302,
3312
+ "sps": 945.2402518055575
3313
+ },
3314
+ {
3315
+ "update": 1845,
3316
+ "global_step": 7557120,
3317
+ "num_episodes": 2575,
3318
+ "mean_reward": 23.735417137145998,
3319
+ "mean_length": 1716.6,
3320
+ "loss": 1.5831042528152466,
3321
+ "sps": 1337.9087240256438
3322
+ },
3323
+ {
3324
+ "update": 1850,
3325
+ "global_step": 7577600,
3326
+ "num_episodes": 2575,
3327
+ "mean_reward": 23.735417137145998,
3328
+ "mean_length": 1716.6,
3329
+ "loss": 0.5824832320213318,
3330
+ "sps": 1306.8006548657663
3331
+ },
3332
+ {
3333
+ "update": 1855,
3334
+ "global_step": 7598080,
3335
+ "num_episodes": 2579,
3336
+ "mean_reward": 25.101176109313965,
3337
+ "mean_length": 2001.32,
3338
+ "loss": 3.1498076915740967,
3339
+ "sps": 383.12937784399907
3340
+ },
3341
+ {
3342
+ "update": 1860,
3343
+ "global_step": 7618560,
3344
+ "num_episodes": 2607,
3345
+ "mean_reward": 27.537090787887575,
3346
+ "mean_length": 2252.34,
3347
+ "loss": 2.7028920650482178,
3348
+ "sps": 276.49082168209895
3349
+ },
3350
+ {
3351
+ "update": 1865,
3352
+ "global_step": 7639040,
3353
+ "num_episodes": 2609,
3354
+ "mean_reward": 27.678814001083374,
3355
+ "mean_length": 2266.75,
3356
+ "loss": 1.0194354057312012,
3357
+ "sps": 829.6019480631423
3358
+ },
3359
+ {
3360
+ "update": 1870,
3361
+ "global_step": 7659520,
3362
+ "num_episodes": 2609,
3363
+ "mean_reward": 27.678814001083374,
3364
+ "mean_length": 2266.75,
3365
+ "loss": 0.69590824842453,
3366
+ "sps": 1339.889941319705
3367
+ },
3368
+ {
3369
+ "update": 1875,
3370
+ "global_step": 7680000,
3371
+ "num_episodes": 2610,
3372
+ "mean_reward": 27.743932600021363,
3373
+ "mean_length": 2363.84,
3374
+ "loss": 4.22328519821167,
3375
+ "sps": 1047.687217978352
3376
+ },
3377
+ {
3378
+ "update": 1880,
3379
+ "global_step": 7700480,
3380
+ "num_episodes": 2623,
3381
+ "mean_reward": 28.715706405639647,
3382
+ "mean_length": 2373.15,
3383
+ "loss": 15.052788734436035,
3384
+ "sps": 180.92434386649956
3385
+ },
3386
+ {
3387
+ "update": 1885,
3388
+ "global_step": 7720960,
3389
+ "num_episodes": 2650,
3390
+ "mean_reward": 28.79868775844574,
3391
+ "mean_length": 2376.77,
3392
+ "loss": 141.97862243652344,
3393
+ "sps": 189.85363801957897
3394
+ },
3395
+ {
3396
+ "update": 1890,
3397
+ "global_step": 7741440,
3398
+ "num_episodes": 2663,
3399
+ "mean_reward": 29.01506271839142,
3400
+ "mean_length": 2187.12,
3401
+ "loss": 2.7120704650878906,
3402
+ "sps": 442.6728082704278
3403
+ },
3404
+ {
3405
+ "update": 1895,
3406
+ "global_step": 7761920,
3407
+ "num_episodes": 2664,
3408
+ "mean_reward": 28.84576090812683,
3409
+ "mean_length": 2212.69,
3410
+ "loss": 0.6469894647598267,
3411
+ "sps": 1603.7569257295258
3412
+ },
3413
+ {
3414
+ "update": 1900,
3415
+ "global_step": 7782400,
3416
+ "num_episodes": 2665,
3417
+ "mean_reward": 29.100953969955444,
3418
+ "mean_length": 2212.69,
3419
+ "loss": 5.927608489990234,
3420
+ "sps": 1560.5345956258918
3421
+ },
3422
+ {
3423
+ "update": 1905,
3424
+ "global_step": 7802880,
3425
+ "num_episodes": 2677,
3426
+ "mean_reward": 29.594479999542237,
3427
+ "mean_length": 2279.92,
3428
+ "loss": 3.0546085834503174,
3429
+ "sps": 239.632742367245
3430
+ },
3431
+ {
3432
+ "update": 1910,
3433
+ "global_step": 7823360,
3434
+ "num_episodes": 2696,
3435
+ "mean_reward": 29.67495318889618,
3436
+ "mean_length": 2013.09,
3437
+ "loss": 0.40485668182373047,
3438
+ "sps": 483.9873532409158
3439
+ },
3440
+ {
3441
+ "update": 1915,
3442
+ "global_step": 7843840,
3443
+ "num_episodes": 2706,
3444
+ "mean_reward": 30.284353771209716,
3445
+ "mean_length": 2114.96,
3446
+ "loss": 1.9687494039535522,
3447
+ "sps": 668.5066115534931
3448
+ },
3449
+ {
3450
+ "update": 1920,
3451
+ "global_step": 7864320,
3452
+ "num_episodes": 2713,
3453
+ "mean_reward": 28.00047863960266,
3454
+ "mean_length": 1718.87,
3455
+ "loss": 0.2566870450973511,
3456
+ "sps": 774.1152448288775
3457
+ },
3458
+ {
3459
+ "update": 1925,
3460
+ "global_step": 7884800,
3461
+ "num_episodes": 2728,
3462
+ "mean_reward": 28.49471785068512,
3463
+ "mean_length": 1709.19,
3464
+ "loss": 0.5462437272071838,
3465
+ "sps": 335.32752289720815
3466
+ },
3467
+ {
3468
+ "update": 1930,
3469
+ "global_step": 7905280,
3470
+ "num_episodes": 2749,
3471
+ "mean_reward": 29.774335384368896,
3472
+ "mean_length": 1793.46,
3473
+ "loss": 1.6195909976959229,
3474
+ "sps": 151.6658643145884
3475
+ },
3476
+ {
3477
+ "update": 1935,
3478
+ "global_step": 7925760,
3479
+ "num_episodes": 2763,
3480
+ "mean_reward": 29.902848501205444,
3481
+ "mean_length": 1903.12,
3482
+ "loss": 3.346653461456299,
3483
+ "sps": 299.9200663991073
3484
+ },
3485
+ {
3486
+ "update": 1940,
3487
+ "global_step": 7946240,
3488
+ "num_episodes": 2766,
3489
+ "mean_reward": 29.951358599662782,
3490
+ "mean_length": 1817.08,
3491
+ "loss": 2.196648120880127,
3492
+ "sps": 612.8349441003236
3493
+ },
3494
+ {
3495
+ "update": 1945,
3496
+ "global_step": 7966720,
3497
+ "num_episodes": 2775,
3498
+ "mean_reward": 27.90867582321167,
3499
+ "mean_length": 1530.25,
3500
+ "loss": 5.727337837219238,
3501
+ "sps": 686.4829120395315
3502
+ },
3503
+ {
3504
+ "update": 1950,
3505
+ "global_step": 7987200,
3506
+ "num_episodes": 2791,
3507
+ "mean_reward": 27.395657448768617,
3508
+ "mean_length": 1515.36,
3509
+ "loss": 0.2642287015914917,
3510
+ "sps": 427.4124315071339
3511
+ },
3512
+ {
3513
+ "update": 1955,
3514
+ "global_step": 8007680,
3515
+ "num_episodes": 2811,
3516
+ "mean_reward": 29.46188481807709,
3517
+ "mean_length": 1804.43,
3518
+ "loss": 1.1504522562026978,
3519
+ "sps": 440.35140268496957
3520
+ },
3521
+ {
3522
+ "update": 1960,
3523
+ "global_step": 8028160,
3524
+ "num_episodes": 2811,
3525
+ "mean_reward": 29.46188481807709,
3526
+ "mean_length": 1804.43,
3527
+ "loss": 0.6443239450454712,
3528
+ "sps": 897.6638295466576
3529
+ },
3530
+ {
3531
+ "update": 1965,
3532
+ "global_step": 8048640,
3533
+ "num_episodes": 2811,
3534
+ "mean_reward": 29.46188481807709,
3535
+ "mean_length": 1804.43,
3536
+ "loss": 0.9464260339736938,
3537
+ "sps": 1231.9540613883512
3538
+ },
3539
+ {
3540
+ "update": 1970,
3541
+ "global_step": 8069120,
3542
+ "num_episodes": 2816,
3543
+ "mean_reward": 30.10758180618286,
3544
+ "mean_length": 1900.22,
3545
+ "loss": 13.55959701538086,
3546
+ "sps": 762.2875965279533
3547
+ },
3548
+ {
3549
+ "update": 1975,
3550
+ "global_step": 8089600,
3551
+ "num_episodes": 2836,
3552
+ "mean_reward": 33.175699801445006,
3553
+ "mean_length": 2303.58,
3554
+ "loss": 7.275441646575928,
3555
+ "sps": 227.75311612847298
3556
+ },
3557
+ {
3558
+ "update": 1980,
3559
+ "global_step": 8110080,
3560
+ "num_episodes": 2856,
3561
+ "mean_reward": 29.203288111686707,
3562
+ "mean_length": 2029.45,
3563
+ "loss": 39.95777130126953,
3564
+ "sps": 325.31625618278383
3565
+ },
3566
+ {
3567
+ "update": 1985,
3568
+ "global_step": 8130560,
3569
+ "num_episodes": 2877,
3570
+ "mean_reward": 26.940120091438292,
3571
+ "mean_length": 1738.19,
3572
+ "loss": 2.9013671875,
3573
+ "sps": 477.60825044232615
3574
+ },
3575
+ {
3576
+ "update": 1990,
3577
+ "global_step": 8151040,
3578
+ "num_episodes": 2881,
3579
+ "mean_reward": 26.892020201683046,
3580
+ "mean_length": 1748.27,
3581
+ "loss": 1.5360685586929321,
3582
+ "sps": 779.3787507434323
3583
+ },
3584
+ {
3585
+ "update": 1995,
3586
+ "global_step": 8171520,
3587
+ "num_episodes": 2887,
3588
+ "mean_reward": 28.383681316375732,
3589
+ "mean_length": 2035.38,
3590
+ "loss": 0.28828829526901245,
3591
+ "sps": 528.7312594886902
3592
+ },
3593
+ {
3594
+ "update": 2000,
3595
+ "global_step": 8192000,
3596
+ "num_episodes": 2894,
3597
+ "mean_reward": 27.269041929244995,
3598
+ "mean_length": 2017.08,
3599
+ "loss": 92.5503158569336,
3600
+ "sps": 423.8490051832835
3601
+ },
3602
+ {
3603
+ "update": 2005,
3604
+ "global_step": 8212480,
3605
+ "num_episodes": 2910,
3606
+ "mean_reward": 29.26204339504242,
3607
+ "mean_length": 2043.4,
3608
+ "loss": 1.737417221069336,
3609
+ "sps": 319.480134990497
3610
+ },
3611
+ {
3612
+ "update": 2010,
3613
+ "global_step": 8232960,
3614
+ "num_episodes": 2914,
3615
+ "mean_reward": 27.918287878036498,
3616
+ "mean_length": 1848.0,
3617
+ "loss": 9.918323516845703,
3618
+ "sps": 520.2825326468419
3619
+ },
3620
+ {
3621
+ "update": 2015,
3622
+ "global_step": 8253440,
3623
+ "num_episodes": 2923,
3624
+ "mean_reward": 25.578041405677794,
3625
+ "mean_length": 1641.01,
3626
+ "loss": 101.00448608398438,
3627
+ "sps": 412.83971034964594
3628
+ },
3629
+ {
3630
+ "update": 2020,
3631
+ "global_step": 8273920,
3632
+ "num_episodes": 2929,
3633
+ "mean_reward": 24.985388655662536,
3634
+ "mean_length": 1728.38,
3635
+ "loss": 33.80594253540039,
3636
+ "sps": 407.32375547775194
3637
+ },
3638
+ {
3639
+ "update": 2025,
3640
+ "global_step": 8294400,
3641
+ "num_episodes": 2941,
3642
+ "mean_reward": 54.48088625907898,
3643
+ "mean_length": 1727.06,
3644
+ "loss": 57.52255630493164,
3645
+ "sps": 251.07848653412563
3646
+ },
3647
+ {
3648
+ "update": 2030,
3649
+ "global_step": 8314880,
3650
+ "num_episodes": 2952,
3651
+ "mean_reward": 62.7720309972763,
3652
+ "mean_length": 1877.51,
3653
+ "loss": 18.3802547454834,
3654
+ "sps": 355.5720033718748
3655
+ },
3656
+ {
3657
+ "update": 2035,
3658
+ "global_step": 8335360,
3659
+ "num_episodes": 2957,
3660
+ "mean_reward": 62.98542010307312,
3661
+ "mean_length": 1979.5,
3662
+ "loss": 1.3044229745864868,
3663
+ "sps": 646.2262220511143
3664
+ },
3665
+ {
3666
+ "update": 2040,
3667
+ "global_step": 8355840,
3668
+ "num_episodes": 2961,
3669
+ "mean_reward": 63.64336392879486,
3670
+ "mean_length": 2180.16,
3671
+ "loss": 92.43401336669922,
3672
+ "sps": 659.5681486510539
3673
+ },
3674
+ {
3675
+ "update": 2045,
3676
+ "global_step": 8376320,
3677
+ "num_episodes": 2962,
3678
+ "mean_reward": 63.40275900363922,
3679
+ "mean_length": 2203.23,
3680
+ "loss": 1.014630675315857,
3681
+ "sps": 1446.1879050125103
3682
+ },
3683
+ {
3684
+ "update": 2050,
3685
+ "global_step": 8396800,
3686
+ "num_episodes": 2986,
3687
+ "mean_reward": 79.26061116695404,
3688
+ "mean_length": 2214.81,
3689
+ "loss": 29.98980712890625,
3690
+ "sps": 205.60997792847448
3691
+ },
3692
+ {
3693
+ "update": 2055,
3694
+ "global_step": 8417280,
3695
+ "num_episodes": 3001,
3696
+ "mean_reward": 82.85091641426087,
3697
+ "mean_length": 2227.86,
3698
+ "loss": 32.844966888427734,
3699
+ "sps": 159.2487235394598
3700
+ },
3701
+ {
3702
+ "update": 2060,
3703
+ "global_step": 8437760,
3704
+ "num_episodes": 3027,
3705
+ "mean_reward": 78.89424407482147,
3706
+ "mean_length": 1849.23,
3707
+ "loss": 103.22246551513672,
3708
+ "sps": 213.8993421623802
3709
+ },
3710
+ {
3711
+ "update": 2065,
3712
+ "global_step": 8458240,
3713
+ "num_episodes": 3033,
3714
+ "mean_reward": 78.97739854335785,
3715
+ "mean_length": 1848.21,
3716
+ "loss": 29.219009399414062,
3717
+ "sps": 303.98893287836097
3718
+ },
3719
+ {
3720
+ "update": 2070,
3721
+ "global_step": 8478720,
3722
+ "num_episodes": 3038,
3723
+ "mean_reward": 78.37423317909241,
3724
+ "mean_length": 1807.15,
3725
+ "loss": 34.15611267089844,
3726
+ "sps": 219.78900023670354
3727
+ },
3728
+ {
3729
+ "update": 2075,
3730
+ "global_step": 8499200,
3731
+ "num_episodes": 3052,
3732
+ "mean_reward": 75.33004099369049,
3733
+ "mean_length": 1865.63,
3734
+ "loss": 26.766494750976562,
3735
+ "sps": 339.7451627672537
3736
+ },
3737
+ {
3738
+ "update": 2080,
3739
+ "global_step": 8519680,
3740
+ "num_episodes": 3056,
3741
+ "mean_reward": 75.64053434848785,
3742
+ "mean_length": 1968.04,
3743
+ "loss": 176.32748413085938,
3744
+ "sps": 334.0194819221175
3745
+ },
3746
+ {
3747
+ "update": 2085,
3748
+ "global_step": 8540160,
3749
+ "num_episodes": 3066,
3750
+ "mean_reward": 74.47905586719513,
3751
+ "mean_length": 1544.14,
3752
+ "loss": 167.32608032226562,
3753
+ "sps": 393.50550599249544
3754
+ },
3755
+ {
3756
+ "update": 2090,
3757
+ "global_step": 8560640,
3758
+ "num_episodes": 3072,
3759
+ "mean_reward": 75.01324489593506,
3760
+ "mean_length": 1620.95,
3761
+ "loss": 40.1760139465332,
3762
+ "sps": 272.5369799883836
3763
+ },
3764
+ {
3765
+ "update": 2095,
3766
+ "global_step": 8581120,
3767
+ "num_episodes": 3084,
3768
+ "mean_reward": 80.59875289440156,
3769
+ "mean_length": 1815.35,
3770
+ "loss": 23.691143035888672,
3771
+ "sps": 460.4555876617225
3772
+ },
3773
+ {
3774
+ "update": 2100,
3775
+ "global_step": 8601600,
3776
+ "num_episodes": 3092,
3777
+ "mean_reward": 151.1377474308014,
3778
+ "mean_length": 2023.52,
3779
+ "loss": 119.31157684326172,
3780
+ "sps": 230.98807371797147
3781
+ },
3782
+ {
3783
+ "update": 2105,
3784
+ "global_step": 8622080,
3785
+ "num_episodes": 3104,
3786
+ "mean_reward": 145.0986036682129,
3787
+ "mean_length": 1964.44,
3788
+ "loss": 17.907983779907227,
3789
+ "sps": 136.7176138415176
3790
+ },
3791
+ {
3792
+ "update": 2110,
3793
+ "global_step": 8642560,
3794
+ "num_episodes": 3113,
3795
+ "mean_reward": 145.05979912757874,
3796
+ "mean_length": 2015.23,
3797
+ "loss": 238.5266876220703,
3798
+ "sps": 191.61578614248398
3799
+ },
3800
+ {
3801
+ "update": 2115,
3802
+ "global_step": 8663040,
3803
+ "num_episodes": 3127,
3804
+ "mean_reward": 145.9841158914566,
3805
+ "mean_length": 2223.16,
3806
+ "loss": 279.5526428222656,
3807
+ "sps": 178.22145258830818
3808
+ },
3809
+ {
3810
+ "update": 2120,
3811
+ "global_step": 8683520,
3812
+ "num_episodes": 3129,
3813
+ "mean_reward": 161.8294042444229,
3814
+ "mean_length": 2317.37,
3815
+ "loss": 4402.13232421875,
3816
+ "sps": 187.96868067815723
3817
+ },
3818
+ {
3819
+ "update": 2125,
3820
+ "global_step": 8704000,
3821
+ "num_episodes": 3131,
3822
+ "mean_reward": 254.4432011270523,
3823
+ "mean_length": 2415.71,
3824
+ "loss": 253.15243530273438,
3825
+ "sps": 389.3006404112279
3826
+ },
3827
+ {
3828
+ "update": 2130,
3829
+ "global_step": 8724480,
3830
+ "num_episodes": 3136,
3831
+ "mean_reward": 308.86108244895934,
3832
+ "mean_length": 2455.38,
3833
+ "loss": 42.87656784057617,
3834
+ "sps": 289.06916804822475
3835
+ },
3836
+ {
3837
+ "update": 2135,
3838
+ "global_step": 8744960,
3839
+ "num_episodes": 3151,
3840
+ "mean_reward": 442.90291794776914,
3841
+ "mean_length": 2544.55,
3842
+ "loss": 52.280975341796875,
3843
+ "sps": 168.18531939110147
3844
+ },
3845
+ {
3846
+ "update": 2140,
3847
+ "global_step": 8765440,
3848
+ "num_episodes": 3153,
3849
+ "mean_reward": 443.3635806703567,
3850
+ "mean_length": 2596.27,
3851
+ "loss": 123.76752471923828,
3852
+ "sps": 250.64368624913917
3853
+ },
3854
+ {
3855
+ "update": 2145,
3856
+ "global_step": 8785920,
3857
+ "num_episodes": 3154,
3858
+ "mean_reward": 443.81920017719267,
3859
+ "mean_length": 2693.35,
3860
+ "loss": 8.35338020324707,
3861
+ "sps": 403.9011407049708
3862
+ },
3863
+ {
3864
+ "update": 2150,
3865
+ "global_step": 8806400,
3866
+ "num_episodes": 3158,
3867
+ "mean_reward": 578.3950145959855,
3868
+ "mean_length": 2690.97,
3869
+ "loss": 117.6234359741211,
3870
+ "sps": 427.3976727795614
3871
+ },
3872
+ {
3873
+ "update": 2155,
3874
+ "global_step": 8826880,
3875
+ "num_episodes": 3168,
3876
+ "mean_reward": 593.7985137224198,
3877
+ "mean_length": 2964.72,
3878
+ "loss": 27.907575607299805,
3879
+ "sps": 408.92434877647025
3880
+ },
3881
+ {
3882
+ "update": 2160,
3883
+ "global_step": 8847360,
3884
+ "num_episodes": 3176,
3885
+ "mean_reward": 696.11108481884,
3886
+ "mean_length": 2789.14,
3887
+ "loss": 37.44743347167969,
3888
+ "sps": 405.43106143344477
3889
+ },
3890
+ {
3891
+ "update": 2165,
3892
+ "global_step": 8867840,
3893
+ "num_episodes": 3181,
3894
+ "mean_reward": 691.3086440181733,
3895
+ "mean_length": 2841.13,
3896
+ "loss": 444.12481689453125,
3897
+ "sps": 410.99575565900966
3898
+ },
3899
+ {
3900
+ "update": 2170,
3901
+ "global_step": 8888320,
3902
+ "num_episodes": 3186,
3903
+ "mean_reward": 696.4550316858291,
3904
+ "mean_length": 2910.25,
3905
+ "loss": 221.63902282714844,
3906
+ "sps": 355.0140974319845
3907
+ },
3908
+ {
3909
+ "update": 2175,
3910
+ "global_step": 8908800,
3911
+ "num_episodes": 3191,
3912
+ "mean_reward": 696.0580731773376,
3913
+ "mean_length": 3004.52,
3914
+ "loss": 762.4512939453125,
3915
+ "sps": 303.24516246589195
3916
+ },
3917
+ {
3918
+ "update": 2180,
3919
+ "global_step": 8929280,
3920
+ "num_episodes": 3197,
3921
+ "mean_reward": 814.6755250024795,
3922
+ "mean_length": 3002.62,
3923
+ "loss": 409.33624267578125,
3924
+ "sps": 427.0124459168339
3925
+ },
3926
+ {
3927
+ "update": 2185,
3928
+ "global_step": 8949760,
3929
+ "num_episodes": 3210,
3930
+ "mean_reward": 934.5317647647857,
3931
+ "mean_length": 3211.49,
3932
+ "loss": 95.55060577392578,
3933
+ "sps": 125.75423605418091
3934
+ },
3935
+ {
3936
+ "update": 2190,
3937
+ "global_step": 8970240,
3938
+ "num_episodes": 3214,
3939
+ "mean_reward": 1002.574015007019,
3940
+ "mean_length": 3132.07,
3941
+ "loss": 68.63960266113281,
3942
+ "sps": 206.72189870239387
3943
+ },
3944
+ {
3945
+ "update": 2195,
3946
+ "global_step": 8990720,
3947
+ "num_episodes": 3225,
3948
+ "mean_reward": 1036.229081878662,
3949
+ "mean_length": 3290.05,
3950
+ "loss": 450.1454772949219,
3951
+ "sps": 308.8153724070321
3952
+ },
3953
+ {
3954
+ "update": 2200,
3955
+ "global_step": 9011200,
3956
+ "num_episodes": 3226,
3957
+ "mean_reward": 1175.8488982009887,
3958
+ "mean_length": 3386.93,
3959
+ "loss": 11.516603469848633,
3960
+ "sps": 505.34211227316496
3961
+ },
3962
+ {
3963
+ "update": 2205,
3964
+ "global_step": 9031680,
3965
+ "num_episodes": 3228,
3966
+ "mean_reward": 1241.2947177028657,
3967
+ "mean_length": 3452.12,
3968
+ "loss": 998.0031127929688,
3969
+ "sps": 380.7435127282772
3970
+ },
3971
+ {
3972
+ "update": 2210,
3973
+ "global_step": 9052160,
3974
+ "num_episodes": 3234,
3975
+ "mean_reward": 1148.464052476883,
3976
+ "mean_length": 3444.2,
3977
+ "loss": 0.2201317399740219,
3978
+ "sps": 447.01108197826755
3979
+ },
3980
+ {
3981
+ "update": 2215,
3982
+ "global_step": 9072640,
3983
+ "num_episodes": 3251,
3984
+ "mean_reward": 983.5756113386154,
3985
+ "mean_length": 3185.62,
3986
+ "loss": 10.140728950500488,
3987
+ "sps": 285.88922550188585
3988
+ },
3989
+ {
3990
+ "update": 2220,
3991
+ "global_step": 9093120,
3992
+ "num_episodes": 3267,
3993
+ "mean_reward": 830.9440940666199,
3994
+ "mean_length": 2711.54,
3995
+ "loss": 23.857481002807617,
3996
+ "sps": 225.0139483384211
3997
+ },
3998
+ {
3999
+ "update": 2225,
4000
+ "global_step": 9113600,
4001
+ "num_episodes": 3285,
4002
+ "mean_reward": 755.7127734375,
4003
+ "mean_length": 2369.97,
4004
+ "loss": 263.3361511230469,
4005
+ "sps": 274.6876121563017
4006
+ },
4007
+ {
4008
+ "update": 2230,
4009
+ "global_step": 9134080,
4010
+ "num_episodes": 3291,
4011
+ "mean_reward": 754.6721585416794,
4012
+ "mean_length": 2306.49,
4013
+ "loss": 4289.046875,
4014
+ "sps": 174.4840112420559
4015
+ },
4016
+ {
4017
+ "update": 2235,
4018
+ "global_step": 9154560,
4019
+ "num_episodes": 3298,
4020
+ "mean_reward": 730.0353905773163,
4021
+ "mean_length": 2354.08,
4022
+ "loss": 84.65528869628906,
4023
+ "sps": 265.5093767251286
4024
+ },
4025
+ {
4026
+ "update": 2240,
4027
+ "global_step": 9175040,
4028
+ "num_episodes": 3300,
4029
+ "mean_reward": 660.2916961956024,
4030
+ "mean_length": 2256.21,
4031
+ "loss": 388.5572814941406,
4032
+ "sps": 407.65937488684284
4033
+ },
4034
+ {
4035
+ "update": 2245,
4036
+ "global_step": 9195520,
4037
+ "num_episodes": 3309,
4038
+ "mean_reward": 687.2124669837951,
4039
+ "mean_length": 2477.43,
4040
+ "loss": 172.86532592773438,
4041
+ "sps": 298.38812190862865
4042
+ },
4043
+ {
4044
+ "update": 2250,
4045
+ "global_step": 9216000,
4046
+ "num_episodes": 3310,
4047
+ "mean_reward": 687.4369230651855,
4048
+ "mean_length": 2481.91,
4049
+ "loss": 14.442253112792969,
4050
+ "sps": 667.1891280431901
4051
+ },
4052
+ {
4053
+ "update": 2255,
4054
+ "global_step": 9236480,
4055
+ "num_episodes": 3320,
4056
+ "mean_reward": 686.6187604236603,
4057
+ "mean_length": 2631.57,
4058
+ "loss": 8.769046783447266,
4059
+ "sps": 248.0906360841076
4060
+ },
4061
+ {
4062
+ "update": 2260,
4063
+ "global_step": 9256960,
4064
+ "num_episodes": 3324,
4065
+ "mean_reward": 714.0761754512787,
4066
+ "mean_length": 2596.79,
4067
+ "loss": 2.7038824558258057,
4068
+ "sps": 611.7833643654915
4069
+ },
4070
+ {
4071
+ "update": 2265,
4072
+ "global_step": 9277440,
4073
+ "num_episodes": 3333,
4074
+ "mean_reward": 514.9936645078659,
4075
+ "mean_length": 2480.39,
4076
+ "loss": 107.10433959960938,
4077
+ "sps": 352.06042337330143
4078
+ },
4079
+ {
4080
+ "update": 2270,
4081
+ "global_step": 9297920,
4082
+ "num_episodes": 3333,
4083
+ "mean_reward": 514.9936645078659,
4084
+ "mean_length": 2480.39,
4085
+ "loss": 205.3878936767578,
4086
+ "sps": 557.4872251219026
4087
+ },
4088
+ {
4089
+ "update": 2275,
4090
+ "global_step": 9318400,
4091
+ "num_episodes": 3341,
4092
+ "mean_reward": 516.5512644720078,
4093
+ "mean_length": 2668.78,
4094
+ "loss": 10.443692207336426,
4095
+ "sps": 381.5541543957095
4096
+ },
4097
+ {
4098
+ "update": 2280,
4099
+ "global_step": 9338880,
4100
+ "num_episodes": 3349,
4101
+ "mean_reward": 462.85379876613615,
4102
+ "mean_length": 2668.78,
4103
+ "loss": 442.84942626953125,
4104
+ "sps": 279.7896158892528
4105
+ },
4106
+ {
4107
+ "update": 2285,
4108
+ "global_step": 9359360,
4109
+ "num_episodes": 3360,
4110
+ "mean_reward": 507.16173345565795,
4111
+ "mean_length": 2757.46,
4112
+ "loss": 65.15730285644531,
4113
+ "sps": 265.8172803900086
4114
+ },
4115
+ {
4116
+ "update": 2290,
4117
+ "global_step": 9379840,
4118
+ "num_episodes": 3363,
4119
+ "mean_reward": 507.4162758731842,
4120
+ "mean_length": 2818.63,
4121
+ "loss": 70.9657211303711,
4122
+ "sps": 353.8588588797036
4123
+ },
4124
+ {
4125
+ "update": 2295,
4126
+ "global_step": 9400320,
4127
+ "num_episodes": 3368,
4128
+ "mean_reward": 551.9985440444947,
4129
+ "mean_length": 2959.32,
4130
+ "loss": 5.74236536026001,
4131
+ "sps": 340.8187159543217
4132
+ },
4133
+ {
4134
+ "update": 2300,
4135
+ "global_step": 9420800,
4136
+ "num_episodes": 3377,
4137
+ "mean_reward": 564.7782349967956,
4138
+ "mean_length": 3130.89,
4139
+ "loss": 5.68098783493042,
4140
+ "sps": 383.2875448626093
4141
+ },
4142
+ {
4143
+ "update": 2305,
4144
+ "global_step": 9441280,
4145
+ "num_episodes": 3390,
4146
+ "mean_reward": 594.4548657274246,
4147
+ "mean_length": 3136.03,
4148
+ "loss": 178.03961181640625,
4149
+ "sps": 278.82986749605175
4150
+ },
4151
+ {
4152
+ "update": 2310,
4153
+ "global_step": 9461760,
4154
+ "num_episodes": 3391,
4155
+ "mean_reward": 595.1103666639328,
4156
+ "mean_length": 3234.65,
4157
+ "loss": 73.16796112060547,
4158
+ "sps": 386.1064879014476
4159
+ },
4160
+ {
4161
+ "update": 2315,
4162
+ "global_step": 9482240,
4163
+ "num_episodes": 3397,
4164
+ "mean_reward": 560.5055163288116,
4165
+ "mean_length": 3213.23,
4166
+ "loss": 1437.6278076171875,
4167
+ "sps": 198.45749967938127
4168
+ },
4169
+ {
4170
+ "update": 2320,
4171
+ "global_step": 9502720,
4172
+ "num_episodes": 3411,
4173
+ "mean_reward": 401.03073542118074,
4174
+ "mean_length": 2986.15,
4175
+ "loss": 3.3738672733306885,
4176
+ "sps": 1083.0313220969133
4177
+ },
4178
+ {
4179
+ "update": 2325,
4180
+ "global_step": 9523200,
4181
+ "num_episodes": 3412,
4182
+ "mean_reward": 413.53732123851773,
4183
+ "mean_length": 3083.93,
4184
+ "loss": 21.815982818603516,
4185
+ "sps": 659.8576311368523
4186
+ },
4187
+ {
4188
+ "update": 2330,
4189
+ "global_step": 9543680,
4190
+ "num_episodes": 3413,
4191
+ "mean_reward": 407.9750119304657,
4192
+ "mean_length": 3002.15,
4193
+ "loss": 0.47333452105522156,
4194
+ "sps": 726.1803705441753
4195
+ },
4196
+ {
4197
+ "update": 2335,
4198
+ "global_step": 9564160,
4199
+ "num_episodes": 3420,
4200
+ "mean_reward": 454.97729597568514,
4201
+ "mean_length": 3001.83,
4202
+ "loss": 0.5463234186172485,
4203
+ "sps": 582.7651222934119
4204
+ },
4205
+ {
4206
+ "update": 2340,
4207
+ "global_step": 9584640,
4208
+ "num_episodes": 3454,
4209
+ "mean_reward": 401.5923653173447,
4210
+ "mean_length": 2474.53,
4211
+ "loss": 0.9732112884521484,
4212
+ "sps": 190.63133488811127
4213
+ },
4214
+ {
4215
+ "update": 2345,
4216
+ "global_step": 9605120,
4217
+ "num_episodes": 3475,
4218
+ "mean_reward": 180.83635659217833,
4219
+ "mean_length": 1855.64,
4220
+ "loss": 0.047050729393959045,
4221
+ "sps": 497.0424644143239
4222
+ },
4223
+ {
4224
+ "update": 2350,
4225
+ "global_step": 9625600,
4226
+ "num_episodes": 3482,
4227
+ "mean_reward": 181.7909957742691,
4228
+ "mean_length": 1925.71,
4229
+ "loss": 1.1374897956848145,
4230
+ "sps": 853.7958489004412
4231
+ },
4232
+ {
4233
+ "update": 2355,
4234
+ "global_step": 9646080,
4235
+ "num_episodes": 3489,
4236
+ "mean_reward": 158.73119886875153,
4237
+ "mean_length": 1914.03,
4238
+ "loss": 1.258785605430603,
4239
+ "sps": 551.9084865795908
4240
+ },
4241
+ {
4242
+ "update": 2360,
4243
+ "global_step": 9666560,
4244
+ "num_episodes": 3502,
4245
+ "mean_reward": 98.6479220199585,
4246
+ "mean_length": 1635.16,
4247
+ "loss": 0.43499165773391724,
4248
+ "sps": 419.3776501549414
4249
+ },
4250
+ {
4251
+ "update": 2365,
4252
+ "global_step": 9687040,
4253
+ "num_episodes": 3515,
4254
+ "mean_reward": 66.50108590602875,
4255
+ "mean_length": 1518.13,
4256
+ "loss": 166.75491333007812,
4257
+ "sps": 281.0402931910567
4258
+ },
4259
+ {
4260
+ "update": 2370,
4261
+ "global_step": 9707520,
4262
+ "num_episodes": 3528,
4263
+ "mean_reward": 33.59190697193146,
4264
+ "mean_length": 1331.87,
4265
+ "loss": 274.23516845703125,
4266
+ "sps": 287.97774743130026
4267
+ },
4268
+ {
4269
+ "update": 2375,
4270
+ "global_step": 9728000,
4271
+ "num_episodes": 3538,
4272
+ "mean_reward": 23.649227323532106,
4273
+ "mean_length": 1328.19,
4274
+ "loss": 24.9585018157959,
4275
+ "sps": 656.234816203577
4276
+ },
4277
+ {
4278
+ "update": 2380,
4279
+ "global_step": 9748480,
4280
+ "num_episodes": 3539,
4281
+ "mean_reward": 24.15320231437683,
4282
+ "mean_length": 1427.05,
4283
+ "loss": 13.851099967956543,
4284
+ "sps": 840.3197888208443
4285
+ },
4286
+ {
4287
+ "update": 2385,
4288
+ "global_step": 9768960,
4289
+ "num_episodes": 3553,
4290
+ "mean_reward": 25.30362526893616,
4291
+ "mean_length": 1552.4,
4292
+ "loss": -0.07959967106580734,
4293
+ "sps": 348.5790453337399
4294
+ },
4295
+ {
4296
+ "update": 2390,
4297
+ "global_step": 9789440,
4298
+ "num_episodes": 3575,
4299
+ "mean_reward": 43.762970843315124,
4300
+ "mean_length": 1834.94,
4301
+ "loss": 256.5601806640625,
4302
+ "sps": 314.0190386184578
4303
+ },
4304
+ {
4305
+ "update": 2395,
4306
+ "global_step": 9809920,
4307
+ "num_episodes": 3584,
4308
+ "mean_reward": 41.22667598724365,
4309
+ "mean_length": 1832.99,
4310
+ "loss": 37.159915924072266,
4311
+ "sps": 261.015177859497
4312
+ },
4313
+ {
4314
+ "update": 2400,
4315
+ "global_step": 9830400,
4316
+ "num_episodes": 3593,
4317
+ "mean_reward": 56.500927686691284,
4318
+ "mean_length": 1809.72,
4319
+ "loss": 949.2464599609375,
4320
+ "sps": 236.54852291533763
4321
+ },
4322
+ {
4323
+ "update": 2405,
4324
+ "global_step": 9850880,
4325
+ "num_episodes": 3599,
4326
+ "mean_reward": 105.25954935073852,
4327
+ "mean_length": 1813.21,
4328
+ "loss": 1035.45751953125,
4329
+ "sps": 654.1456293848414
4330
+ },
4331
+ {
4332
+ "update": 2410,
4333
+ "global_step": 9871360,
4334
+ "num_episodes": 3608,
4335
+ "mean_reward": 174.8251446390152,
4336
+ "mean_length": 1969.27,
4337
+ "loss": 12.137419700622559,
4338
+ "sps": 374.5700328217855
4339
+ },
4340
+ {
4341
+ "update": 2415,
4342
+ "global_step": 9891840,
4343
+ "num_episodes": 3613,
4344
+ "mean_reward": 174.6033950281143,
4345
+ "mean_length": 2093.28,
4346
+ "loss": 470.9022216796875,
4347
+ "sps": 829.3315456709553
4348
+ },
4349
+ {
4350
+ "update": 2420,
4351
+ "global_step": 9912320,
4352
+ "num_episodes": 3614,
4353
+ "mean_reward": 175.19530655384062,
4354
+ "mean_length": 2188.0,
4355
+ "loss": 7.125182628631592,
4356
+ "sps": 1392.3792980312514
4357
+ },
4358
+ {
4359
+ "update": 2425,
4360
+ "global_step": 9932800,
4361
+ "num_episodes": 3616,
4362
+ "mean_reward": 175.76038108348845,
4363
+ "mean_length": 2285.88,
4364
+ "loss": 11.183570861816406,
4365
+ "sps": 303.2125684013846
4366
+ },
4367
+ {
4368
+ "update": 2430,
4369
+ "global_step": 9953280,
4370
+ "num_episodes": 3625,
4371
+ "mean_reward": 245.62015884399415,
4372
+ "mean_length": 2481.25,
4373
+ "loss": 72.24958038330078,
4374
+ "sps": 223.82439359999108
4375
+ },
4376
+ {
4377
+ "update": 2435,
4378
+ "global_step": 9973760,
4379
+ "num_episodes": 3640,
4380
+ "mean_reward": 12.841221110026042,
4381
+ "mean_length": 242.8,
4382
+ "loss": 39.742530822753906,
4383
+ "sps": 444.8952550886448
4384
+ },
4385
+ {
4386
+ "update": 2440,
4387
+ "global_step": 9994240,
4388
+ "num_episodes": 3640,
4389
+ "mean_reward": 12.841221110026042,
4390
+ "mean_length": 242.8,
4391
+ "loss": 0.29037195444107056,
4392
+ "sps": 1069.6427524147307
4393
+ },
4394
+ {
4395
+ "update": 2445,
4396
+ "global_step": 10014720,
4397
+ "num_episodes": 3640,
4398
+ "mean_reward": 12.841221110026042,
4399
+ "mean_length": 242.8,
4400
+ "loss": 3.3830156326293945,
4401
+ "sps": 1318.7931203919134
4402
+ },
4403
+ {
4404
+ "update": 2450,
4405
+ "global_step": 10035200,
4406
+ "num_episodes": 3644,
4407
+ "mean_reward": 91.08290822882401,
4408
+ "mean_length": 2296.9473684210525,
4409
+ "loss": 20.516225814819336,
4410
+ "sps": 481.54488129169596
4411
+ },
4412
+ {
4413
+ "update": 2455,
4414
+ "global_step": 10055680,
4415
+ "num_episodes": 3658,
4416
+ "mean_reward": 124.92099246111783,
4417
+ "mean_length": 2621.4848484848485,
4418
+ "loss": 150.3678436279297,
4419
+ "sps": 253.02333051932328
4420
+ },
4421
+ {
4422
+ "update": 2460,
4423
+ "global_step": 10076160,
4424
+ "num_episodes": 3665,
4425
+ "mean_reward": 104.92156012058258,
4426
+ "mean_length": 2402.575,
4427
+ "loss": 43.09859085083008,
4428
+ "sps": 444.9419436729771
4429
+ },
4430
+ {
4431
+ "update": 2465,
4432
+ "global_step": 10096640,
4433
+ "num_episodes": 3668,
4434
+ "mean_reward": 99.1124037143796,
4435
+ "mean_length": 2358.6744186046512,
4436
+ "loss": 30.914440155029297,
4437
+ "sps": 564.239378309882
4438
+ },
4439
+ {
4440
+ "update": 2470,
4441
+ "global_step": 10117120,
4442
+ "num_episodes": 3674,
4443
+ "mean_reward": 212.01101287530392,
4444
+ "mean_length": 2689.183673469388,
4445
+ "loss": 519.401611328125,
4446
+ "sps": 400.7113960553138
4447
+ },
4448
+ {
4449
+ "update": 2475,
4450
+ "global_step": 10137600,
4451
+ "num_episodes": 3685,
4452
+ "mean_reward": 246.73736356099445,
4453
+ "mean_length": 2406.016666666667,
4454
+ "loss": 18.674598693847656,
4455
+ "sps": 645.3387661404473
4456
+ },
4457
+ {
4458
+ "update": 2480,
4459
+ "global_step": 10158080,
4460
+ "num_episodes": 3690,
4461
+ "mean_reward": 254.607934988462,
4462
+ "mean_length": 2687.8153846153846,
4463
+ "loss": 227.44029235839844,
4464
+ "sps": 409.4558222523011
4465
  }
4466
  ]