File size: 12,482 Bytes
9b49d79
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
[2023-09-12 13:21:22,562][09743] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2023-09-12 13:21:22,562][09743] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
[2023-09-12 13:21:22,598][09743] Num visible devices: 1
[2023-09-12 13:21:22,637][09743] Starting seed is not provided
[2023-09-12 13:21:22,638][09743] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2023-09-12 13:21:22,638][09743] Initializing actor-critic model on device cuda:0
[2023-09-12 13:21:22,638][09743] RunningMeanStd input shape: (3, 72, 128)
[2023-09-12 13:21:22,639][09743] RunningMeanStd input shape: (1,)
[2023-09-12 13:21:22,659][09743] ConvEncoder: input_channels=3
[2023-09-12 13:21:22,911][09743] Conv encoder output size: 512
[2023-09-12 13:21:22,911][09743] Policy head output size: 512
[2023-09-12 13:21:22,935][09743] Created Actor Critic model with architecture:
[2023-09-12 13:21:22,935][09743] ActorCriticSharedWeights(
  (obs_normalizer): ObservationNormalizer(
    (running_mean_std): RunningMeanStdDictInPlace(
      (running_mean_std): ModuleDict(
        (obs): RunningMeanStdInPlace()
      )
    )
  )
  (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
  (encoder): VizdoomEncoder(
    (basic_encoder): ConvEncoder(
      (enc): RecursiveScriptModule(
        original_name=ConvEncoderImpl
        (conv_head): RecursiveScriptModule(
          original_name=Sequential
          (0): RecursiveScriptModule(original_name=Conv2d)
          (1): RecursiveScriptModule(original_name=ELU)
          (2): RecursiveScriptModule(original_name=Conv2d)
          (3): RecursiveScriptModule(original_name=ELU)
          (4): RecursiveScriptModule(original_name=Conv2d)
          (5): RecursiveScriptModule(original_name=ELU)
        )
        (mlp_layers): RecursiveScriptModule(
          original_name=Sequential
          (0): RecursiveScriptModule(original_name=Linear)
          (1): RecursiveScriptModule(original_name=ELU)
        )
      )
    )
  )
  (core): ModelCoreRNN(
    (core): GRU(512, 512)
  )
  (decoder): MlpDecoder(
    (mlp): Identity()
  )
  (critic_linear): Linear(in_features=512, out_features=1, bias=True)
  (action_parameterization): ActionParameterizationDefault(
    (distribution_linear): Linear(in_features=512, out_features=4, bias=True)
  )
)
[2023-09-12 13:21:24,096][09743] Using optimizer <class 'torch.optim.adam.Adam'>
[2023-09-12 13:21:24,096][09743] No checkpoints found
[2023-09-12 13:21:24,097][09743] Did not load from checkpoint, starting from scratch!
[2023-09-12 13:21:24,097][09743] Initialized policy 0 weights for model version 0
[2023-09-12 13:21:24,098][09743] LearnerWorker_p0 finished initialization!
[2023-09-12 13:21:24,098][09743] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2023-09-12 13:21:24,463][09929] Worker 1 uses CPU cores [4, 5, 6, 7]
[2023-09-12 13:21:24,475][09931] Worker 2 uses CPU cores [8, 9, 10, 11]
[2023-09-12 13:21:24,499][09964] Worker 5 uses CPU cores [20, 21, 22, 23]
[2023-09-12 13:21:24,535][09967] Worker 4 uses CPU cores [16, 17, 18, 19]
[2023-09-12 13:21:24,545][09928] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2023-09-12 13:21:24,545][09928] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
[2023-09-12 13:21:24,566][09928] Num visible devices: 1
[2023-09-12 13:21:24,567][09932] Worker 3 uses CPU cores [12, 13, 14, 15]
[2023-09-12 13:21:24,645][09965] Worker 7 uses CPU cores [28, 29, 30, 31]
[2023-09-12 13:21:24,665][09968] Worker 6 uses CPU cores [24, 25, 26, 27]
[2023-09-12 13:21:24,689][09930] Worker 0 uses CPU cores [0, 1, 2, 3]
[2023-09-12 13:21:25,314][09928] RunningMeanStd input shape: (3, 72, 128)
[2023-09-12 13:21:25,315][09928] RunningMeanStd input shape: (1,)
[2023-09-12 13:21:25,326][09928] ConvEncoder: input_channels=3
[2023-09-12 13:21:25,447][09928] Conv encoder output size: 512
[2023-09-12 13:21:25,448][09928] Policy head output size: 512
[2023-09-12 13:21:25,839][09964] Doom resolution: 160x120, resize resolution: (128, 72)
[2023-09-12 13:21:25,839][09968] Doom resolution: 160x120, resize resolution: (128, 72)
[2023-09-12 13:21:25,839][09967] Doom resolution: 160x120, resize resolution: (128, 72)
[2023-09-12 13:21:25,840][09965] Doom resolution: 160x120, resize resolution: (128, 72)
[2023-09-12 13:21:25,840][09931] Doom resolution: 160x120, resize resolution: (128, 72)
[2023-09-12 13:21:25,848][09932] Doom resolution: 160x120, resize resolution: (128, 72)
[2023-09-12 13:21:25,851][09930] Doom resolution: 160x120, resize resolution: (128, 72)
[2023-09-12 13:21:25,852][09929] Doom resolution: 160x120, resize resolution: (128, 72)
[2023-09-12 13:21:26,147][09967] Decorrelating experience for 0 frames...
[2023-09-12 13:21:26,147][09965] Decorrelating experience for 0 frames...
[2023-09-12 13:21:26,215][09964] Decorrelating experience for 0 frames...
[2023-09-12 13:21:26,239][09929] Decorrelating experience for 0 frames...
[2023-09-12 13:21:26,258][09968] Decorrelating experience for 0 frames...
[2023-09-12 13:21:26,273][09931] Decorrelating experience for 0 frames...
[2023-09-12 13:21:26,286][09930] Decorrelating experience for 0 frames...
[2023-09-12 13:21:26,418][09967] Decorrelating experience for 32 frames...
[2023-09-12 13:21:26,493][09964] Decorrelating experience for 32 frames...
[2023-09-12 13:21:26,522][09965] Decorrelating experience for 32 frames...
[2023-09-12 13:21:26,525][09929] Decorrelating experience for 32 frames...
[2023-09-12 13:21:26,551][09932] Decorrelating experience for 0 frames...
[2023-09-12 13:21:26,556][09931] Decorrelating experience for 32 frames...
[2023-09-12 13:21:26,568][09930] Decorrelating experience for 32 frames...
[2023-09-12 13:21:26,775][09967] Decorrelating experience for 64 frames...
[2023-09-12 13:21:26,821][09932] Decorrelating experience for 32 frames...
[2023-09-12 13:21:26,852][09964] Decorrelating experience for 64 frames...
[2023-09-12 13:21:26,919][09931] Decorrelating experience for 64 frames...
[2023-09-12 13:21:26,929][09930] Decorrelating experience for 64 frames...
[2023-09-12 13:21:27,103][09929] Decorrelating experience for 64 frames...
[2023-09-12 13:21:27,160][09968] Decorrelating experience for 32 frames...
[2023-09-12 13:21:27,164][09965] Decorrelating experience for 64 frames...
[2023-09-12 13:21:27,195][09932] Decorrelating experience for 64 frames...
[2023-09-12 13:21:27,201][09964] Decorrelating experience for 96 frames...
[2023-09-12 13:21:27,330][09931] Decorrelating experience for 96 frames...
[2023-09-12 13:21:27,451][09929] Decorrelating experience for 96 frames...
[2023-09-12 13:21:27,465][09967] Decorrelating experience for 96 frames...
[2023-09-12 13:21:27,498][09965] Decorrelating experience for 96 frames...
[2023-09-12 13:21:27,507][09930] Decorrelating experience for 96 frames...
[2023-09-12 13:21:27,588][09968] Decorrelating experience for 64 frames...
[2023-09-12 13:21:27,634][09932] Decorrelating experience for 96 frames...
[2023-09-12 13:21:27,903][09968] Decorrelating experience for 96 frames...
[2023-09-12 13:21:28,649][09743] Signal inference workers to stop experience collection...
[2023-09-12 13:21:28,653][09928] InferenceWorker_p0-w0: stopping experience collection
[2023-09-12 13:21:32,650][09743] Signal inference workers to resume experience collection...
[2023-09-12 13:21:32,651][09928] InferenceWorker_p0-w0: resuming experience collection
[2023-09-12 13:21:35,991][09928] Updated weights for policy 0, policy_version 10 (0.0392)
[2023-09-12 13:21:39,213][09928] Updated weights for policy 0, policy_version 20 (0.0009)
[2023-09-12 13:21:42,363][09928] Updated weights for policy 0, policy_version 30 (0.0009)
[2023-09-12 13:21:42,527][09743] Saving new best policy, reward=-1.655!
[2023-09-12 13:21:45,525][09928] Updated weights for policy 0, policy_version 40 (0.0009)
[2023-09-12 13:21:47,532][09743] Saving new best policy, reward=-0.936!
[2023-09-12 13:21:48,749][09928] Updated weights for policy 0, policy_version 50 (0.0009)
[2023-09-12 13:21:51,927][09928] Updated weights for policy 0, policy_version 60 (0.0008)
[2023-09-12 13:21:52,564][09743] Saving new best policy, reward=0.078!
[2023-09-12 13:21:55,197][09928] Updated weights for policy 0, policy_version 70 (0.0009)
[2023-09-12 13:21:57,529][09743] Saving new best policy, reward=0.521!
[2023-09-12 13:21:58,483][09928] Updated weights for policy 0, policy_version 80 (0.0010)
[2023-09-12 13:22:01,740][09928] Updated weights for policy 0, policy_version 90 (0.0008)
[2023-09-12 13:22:02,527][09743] Saving new best policy, reward=0.599!
[2023-09-12 13:22:05,213][09928] Updated weights for policy 0, policy_version 100 (0.0012)
[2023-09-12 13:22:07,589][09743] Saving new best policy, reward=0.680!
[2023-09-12 13:22:08,636][09928] Updated weights for policy 0, policy_version 110 (0.0017)
[2023-09-12 13:22:11,996][09928] Updated weights for policy 0, policy_version 120 (0.0009)
[2023-09-12 13:22:12,526][09743] Saving new best policy, reward=0.735!
[2023-09-12 13:22:15,370][09928] Updated weights for policy 0, policy_version 130 (0.0008)
[2023-09-12 13:22:17,527][09743] Saving new best policy, reward=0.755!
[2023-09-12 13:22:18,771][09928] Updated weights for policy 0, policy_version 140 (0.0009)
[2023-09-12 13:22:22,142][09928] Updated weights for policy 0, policy_version 150 (0.0009)
[2023-09-12 13:22:22,526][09743] Saving new best policy, reward=0.781!
[2023-09-12 13:22:25,580][09928] Updated weights for policy 0, policy_version 160 (0.0008)
[2023-09-12 13:22:27,573][09743] Saving new best policy, reward=0.791!
[2023-09-12 13:22:28,937][09928] Updated weights for policy 0, policy_version 170 (0.0009)
[2023-09-12 13:22:32,402][09928] Updated weights for policy 0, policy_version 180 (0.0008)
[2023-09-12 13:22:32,526][09743] Saving new best policy, reward=0.794!
[2023-09-12 13:22:35,725][09928] Updated weights for policy 0, policy_version 190 (0.0009)
[2023-09-12 13:22:37,529][09743] Saving new best policy, reward=0.806!
[2023-09-12 13:22:39,128][09928] Updated weights for policy 0, policy_version 200 (0.0010)
[2023-09-12 13:22:42,471][09928] Updated weights for policy 0, policy_version 210 (0.0009)
[2023-09-12 13:22:45,922][09928] Updated weights for policy 0, policy_version 220 (0.0009)
[2023-09-12 13:22:47,533][09743] Saving new best policy, reward=0.815!
[2023-09-12 13:22:49,374][09928] Updated weights for policy 0, policy_version 230 (0.0009)
[2023-09-12 13:22:52,742][09928] Updated weights for policy 0, policy_version 240 (0.0009)
[2023-09-12 13:22:54,862][09743] Stopping Batcher_0...
[2023-09-12 13:22:54,862][09743] Saving /home/cogstack/Documents/optuna/environments/sample_factory/train_dir/default_experiment/checkpoint_p0/checkpoint_000000246_1007616.pth...
[2023-09-12 13:22:54,863][09743] Loop batcher_evt_loop terminating...
[2023-09-12 13:22:54,876][09965] Stopping RolloutWorker_w7...
[2023-09-12 13:22:54,877][09965] Loop rollout_proc7_evt_loop terminating...
[2023-09-12 13:22:54,877][09932] Stopping RolloutWorker_w3...
[2023-09-12 13:22:54,877][09930] Stopping RolloutWorker_w0...
[2023-09-12 13:22:54,877][09968] Stopping RolloutWorker_w6...
[2023-09-12 13:22:54,877][09932] Loop rollout_proc3_evt_loop terminating...
[2023-09-12 13:22:54,877][09930] Loop rollout_proc0_evt_loop terminating...
[2023-09-12 13:22:54,878][09968] Loop rollout_proc6_evt_loop terminating...
[2023-09-12 13:22:54,880][09964] Stopping RolloutWorker_w5...
[2023-09-12 13:22:54,880][09931] Stopping RolloutWorker_w2...
[2023-09-12 13:22:54,880][09964] Loop rollout_proc5_evt_loop terminating...
[2023-09-12 13:22:54,880][09931] Loop rollout_proc2_evt_loop terminating...
[2023-09-12 13:22:54,881][09929] Stopping RolloutWorker_w1...
[2023-09-12 13:22:54,881][09929] Loop rollout_proc1_evt_loop terminating...
[2023-09-12 13:22:54,882][09967] Stopping RolloutWorker_w4...
[2023-09-12 13:22:54,882][09967] Loop rollout_proc4_evt_loop terminating...
[2023-09-12 13:22:54,885][09928] Weights refcount: 2 0
[2023-09-12 13:22:54,887][09928] Stopping InferenceWorker_p0-w0...
[2023-09-12 13:22:54,887][09928] Loop inference_proc0-0_evt_loop terminating...
[2023-09-12 13:22:54,931][09743] Saving /home/cogstack/Documents/optuna/environments/sample_factory/train_dir/default_experiment/checkpoint_p0/checkpoint_000000246_1007616.pth...
[2023-09-12 13:22:55,021][09743] Stopping LearnerWorker_p0...
[2023-09-12 13:22:55,022][09743] Loop learner_proc0_evt_loop terminating...