File size: 20,427 Bytes
eb85c1a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
nohup: ignoring input
/data2/edwardsun/flow_home/amp_flow_training_single_gpu_full_data.py:70: FutureWarning: `torch.cuda.amp.GradScaler(args...)` is deprecated. Please use `torch.amp.GradScaler('cuda', args...)` instead.
  self.scaler = GradScaler()
/data2/edwardsun/flow_home/amp_flow_training_single_gpu_full_data.py:116: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  self.embeddings = torch.load(combined_path, map_location=self.device)
/data2/edwardsun/flow_home/amp_flow_training_single_gpu_full_data.py:180: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  self.compressor.load_state_dict(torch.load('final_compressor_model.pth', map_location=self.device))
/data2/edwardsun/flow_home/amp_flow_training_single_gpu_full_data.py:181: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  self.decompressor.load_state_dict(torch.load('final_decompressor_model.pth', map_location=self.device))
/data2/edwardsun/flow_home/cfg_dataset.py:253: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  self.embeddings = torch.load(combined_path, map_location='cpu')
Starting optimized training with batch_size=96, epochs=6000
Using GPU 0 for optimized H100 training
Mixed precision: True
Batch size: 96
Target epochs: 6000
Learning rate: 0.0004 -> 0.0002
βœ“ Mixed precision training enabled (BF16)
Loading ALL AMP embeddings from /data2/edwardsun/flow_project/peptide_embeddings/...
Loading combined embeddings from /data2/edwardsun/flow_project/peptide_embeddings/all_peptide_embeddings.pt...
βœ“ Loaded ALL embeddings: torch.Size([17968, 50, 1280])
Computing preprocessing statistics...
βœ“ Statistics computed and saved:
  Total embeddings: 17,968
  Mean: -0.0005 Β± 0.0897
  Std: 0.0869 Β± 0.1168
  Range: [-9.1738, 3.2894]
Initializing models...
βœ“ Model compiled with torch.compile for speedup
βœ“ Models initialized:
  Compressor parameters: 78,817,360
  Decompressor parameters: 39,458,720
  Flow model parameters: 50,779,584
Initializing datasets with FULL data...
Loading AMP embeddings from /data2/edwardsun/flow_project/peptide_embeddings/...
Loading combined embeddings from /data2/edwardsun/flow_project/peptide_embeddings/all_peptide_embeddings.pt (FULL DATA)...
βœ“ Loaded ALL embeddings: torch.Size([17968, 50, 1280])
Loading CFG data from FASTA: /home/edwardsun/flow/combined_final.fasta...
Parsing FASTA file: /home/edwardsun/flow/combined_final.fasta
Label assignment: >AP = AMP (0), >sp = Non-AMP (1)
βœ“ Parsed 6983 valid sequences from FASTA
  AMP sequences: 3306
  Non-AMP sequences: 3677
  Masked for CFG: 698
Loaded 6983 CFG sequences
Label distribution: [3306 3677]
Masked 698 labels for CFG training
Aligning AMP embeddings with CFG data...
Aligned 6983 samples
CFG Flow Dataset initialized:
  AMP embeddings: torch.Size([17968, 50, 1280])
  CFG labels: 6983
  Aligned samples: 6983
βœ“ Dataset initialized with FULL data:
  Total samples: 6,983
  Batch size: 96
  Batches per epoch: 73
  Total training steps: 438,000
  Validation every: 10,000 steps
Initializing optimizer and scheduler...
βœ“ Optimizer initialized:
  Base LR: 0.0004
  Min LR: 0.0002
  Warmup steps: 5000
  Weight decay: 0.01
  Gradient clip norm: 1.0
βœ“ Optimized Single GPU training setup complete with FULL DATA!
πŸš€ Starting Optimized Single GPU Flow Matching Training with FULL DATA
GPU: 0
Total iterations: 6000
Batch size: 96
Total samples: 6,983
Mixed precision: True
Estimated time: ~8-10 hours (overnight training with ALL data)
============================================================

Training Flow Model:   0%|          | 0/6000 [00:00<?, ?it/s]/data2/edwardsun/flow_home/amp_flow_training_single_gpu_full_data.py:392: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  with autocast(dtype=torch.bfloat16):
/data2/edwardsun/flow_home/amp_flow_training_single_gpu_full_data.py:392: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  with autocast(dtype=torch.bfloat16):
/data2/edwardsun/flow_home/amp_flow_training_single_gpu_full_data.py:392: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  with autocast(dtype=torch.bfloat16):

Training Flow Model:   0%|          | 1/6000 [01:09<116:23:06, 69.84s/it]Epoch    0 | Step      1/438000 | Loss: 2.328033 | LR: 4.01e-05 | Speed: 0.0 steps/s | ETA: 3889.5h
Epoch    0 | Avg Loss: 0.950054 | LR: 4.53e-05 | Time: 69.8s | Samples: 6,983
/data2/edwardsun/flow_home/amp_flow_training_single_gpu_full_data.py:392: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  with autocast(dtype=torch.bfloat16):
/data2/edwardsun/flow_home/amp_flow_training_single_gpu_full_data.py:392: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  with autocast(dtype=torch.bfloat16):

Training Flow Model:   0%|          | 2/6000 [01:15<53:24:52, 32.06s/it] Epoch    1 | Step     74/438000 | Loss: 0.629602 | LR: 4.53e-05 | Speed: 1.0 steps/s | ETA: 116.2h
Epoch    1 | Avg Loss: 0.415130 | LR: 5.05e-05 | Time: 5.6s | Samples: 6,983
/data2/edwardsun/flow_home/amp_flow_training_single_gpu_full_data.py:392: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  with autocast(dtype=torch.bfloat16):

Training Flow Model:   0%|          | 3/6000 [01:18<31:05:28, 18.66s/it]Epoch    2 | Step    147/438000 | Loss: 0.304313 | LR: 5.06e-05 | Speed: 1.9 steps/s | ETA: 63.2h
Epoch    2 | Avg Loss: 0.227218 | LR: 5.58e-05 | Time: 2.7s | Samples: 6,983

Training Flow Model:   0%|          | 4/6000 [01:20<20:29:56, 12.31s/it]Epoch    3 | Step    220/438000 | Loss: 0.210514 | LR: 5.58e-05 | Speed: 2.8 steps/s | ETA: 43.7h
Epoch    3 | Avg Loss: 0.178846 | LR: 6.10e-05 | Time: 2.6s | Samples: 6,983

Training Flow Model:   0%|          | 5/6000 [01:23<14:48:54,  8.90s/it]Epoch    4 | Step    293/438000 | Loss: 0.182317 | LR: 6.11e-05 | Speed: 3.6 steps/s | ETA: 33.9h
Epoch    4 | Avg Loss: 0.148526 | LR: 6.63e-05 | Time: 2.8s | Samples: 6,983

Training Flow Model:   0%|          | 6/6000 [01:26<11:23:00,  6.84s/it]Epoch    5 | Step    366/438000 | Loss: 0.128248 | LR: 6.64e-05 | Speed: 4.3 steps/s | ETA: 28.1h
Epoch    5 | Avg Loss: 0.127575 | LR: 7.15e-05 | Time: 2.8s | Samples: 6,983

Training Flow Model:   0%|          | 7/6000 [01:29<9:07:19,  5.48s/it] Epoch    6 | Step    439/438000 | Loss: 0.105957 | LR: 7.16e-05 | Speed: 5.0 steps/s | ETA: 24.2h
Epoch    6 | Avg Loss: 0.109353 | LR: 7.68e-05 | Time: 2.7s | Samples: 6,983

Training Flow Model:   0%|          | 8/6000 [01:31<7:38:51,  4.59s/it]Epoch    7 | Step    512/438000 | Loss: 0.087330 | LR: 7.69e-05 | Speed: 5.7 steps/s | ETA: 21.3h
Epoch    7 | Avg Loss: 0.101109 | LR: 8.20e-05 | Time: 2.7s | Samples: 6,983

Training Flow Model:   0%|          | 9/6000 [01:34<6:44:23,  4.05s/it]Epoch    8 | Step    585/438000 | Loss: 0.081881 | LR: 8.21e-05 | Speed: 6.3 steps/s | ETA: 19.3h
Epoch    8 | Avg Loss: 0.089056 | LR: 8.73e-05 | Time: 2.9s | Samples: 6,983

Training Flow Model:   0%|          | 10/6000 [01:37<6:07:40,  3.68s/it]Epoch    9 | Step    658/438000 | Loss: 0.085630 | LR: 8.74e-05 | Speed: 6.9 steps/s | ETA: 17.6h
Epoch    9 | Avg Loss: 0.083894 | LR: 9.26e-05 | Time: 2.9s | Samples: 6,983

Training Flow Model:   0%|          | 11/6000 [01:40<5:42:16,  3.43s/it]Epoch   10 | Step    731/438000 | Loss: 0.081927 | LR: 9.26e-05 | Speed: 7.4 steps/s | ETA: 16.4h
Epoch   10 | Avg Loss: 0.077295 | LR: 9.78e-05 | Time: 2.9s | Samples: 6,983

Training Flow Model:   0%|          | 12/6000 [01:43<5:23:05,  3.24s/it]Epoch   11 | Step    804/438000 | Loss: 0.068221 | LR: 9.79e-05 | Speed: 7.9 steps/s | ETA: 15.3h
Epoch   11 | Avg Loss: 0.072662 | LR: 1.03e-04 | Time: 2.8s | Samples: 6,983

Training Flow Model:   0%|          | 13/6000 [01:46<5:12:15,  3.13s/it]Epoch   12 | Step    877/438000 | Loss: 0.079151 | LR: 1.03e-04 | Speed: 8.4 steps/s | ETA: 14.4h
Epoch   12 | Avg Loss: 0.069846 | LR: 1.08e-04 | Time: 2.9s | Samples: 6,983

Training Flow Model:   0%|          | 14/6000 [01:48<4:58:17,  2.99s/it]Epoch   13 | Step    950/438000 | Loss: 0.074991 | LR: 1.08e-04 | Speed: 8.9 steps/s | ETA: 13.7h
Epoch   13 | Avg Loss: 0.064569 | LR: 1.14e-04 | Time: 2.7s | Samples: 6,983

Training Flow Model:   0%|          | 15/6000 [01:51<4:51:40,  2.92s/it]Epoch   14 | Step   1023/438000 | Loss: 0.043908 | LR: 1.14e-04 | Speed: 9.3 steps/s | ETA: 13.0h
Epoch   14 | Avg Loss: 0.057743 | LR: 1.19e-04 | Time: 2.8s | Samples: 6,983

Training Flow Model:   0%|          | 16/6000 [01:54<4:48:57,  2.90s/it]Epoch   15 | Step   1096/438000 | Loss: 0.048052 | LR: 1.19e-04 | Speed: 9.7 steps/s | ETA: 12.4h
Epoch   15 | Avg Loss: 0.058437 | LR: 1.24e-04 | Time: 2.8s | Samples: 6,983

Training Flow Model:   0%|          | 17/6000 [01:57<4:44:23,  2.85s/it]Epoch   16 | Step   1169/438000 | Loss: 0.045587 | LR: 1.24e-04 | Speed: 10.1 steps/s | ETA: 12.0h
Epoch   16 | Avg Loss: 0.055771 | LR: 1.29e-04 | Time: 2.7s | Samples: 6,983

Training Flow Model:   0%|          | 18/6000 [01:59<4:41:37,  2.82s/it]Epoch   17 | Step   1242/438000 | Loss: 0.053337 | LR: 1.29e-04 | Speed: 10.5 steps/s | ETA: 11.5h
Epoch   17 | Avg Loss: 0.053140 | LR: 1.35e-04 | Time: 2.8s | Samples: 6,983

Training Flow Model:   0%|          | 19/6000 [02:02<4:42:38,  2.84s/it]Epoch   18 | Step   1315/438000 | Loss: 0.075343 | LR: 1.35e-04 | Speed: 10.9 steps/s | ETA: 11.1h
Epoch   18 | Avg Loss: 0.049295 | LR: 1.40e-04 | Time: 2.9s | Samples: 6,983

Training Flow Model:   0%|          | 20/6000 [02:05<4:42:48,  2.84s/it]Epoch   19 | Step   1388/438000 | Loss: 0.043840 | LR: 1.40e-04 | Speed: 11.2 steps/s | ETA: 10.8h
Epoch   19 | Avg Loss: 0.049483 | LR: 1.45e-04 | Time: 2.8s | Samples: 6,983

Training Flow Model:   0%|          | 21/6000 [02:08<4:41:33,  2.83s/it]Epoch   20 | Step   1461/438000 | Loss: 0.076462 | LR: 1.45e-04 | Speed: 11.6 steps/s | ETA: 10.5h
Epoch   20 | Avg Loss: 0.048242 | LR: 1.50e-04 | Time: 2.8s | Samples: 6,983

Training Flow Model:   0%|          | 22/6000 [02:11<4:40:07,  2.81s/it]Epoch   21 | Step   1534/438000 | Loss: 0.039453 | LR: 1.50e-04 | Speed: 11.9 steps/s | ETA: 10.2h
Epoch   21 | Avg Loss: 0.047419 | LR: 1.56e-04 | Time: 2.8s | Samples: 6,983

Training Flow Model:   0%|          | 23/6000 [02:13<4:40:29,  2.82s/it]Epoch   22 | Step   1607/438000 | Loss: 0.058766 | LR: 1.56e-04 | Speed: 12.2 steps/s | ETA: 10.0h
Epoch   22 | Avg Loss: 0.047794 | LR: 1.61e-04 | Time: 2.8s | Samples: 6,983

Training Flow Model:   0%|          | 24/6000 [02:16<4:45:06,  2.86s/it]Epoch   23 | Step   1680/438000 | Loss: 0.038332 | LR: 1.61e-04 | Speed: 12.4 steps/s | ETA: 9.7h
Epoch   23 | Avg Loss: 0.047601 | LR: 1.66e-04 | Time: 3.0s | Samples: 6,983

Training Flow Model:   0%|          | 25/6000 [02:19<4:45:46,  2.87s/it]Epoch   24 | Step   1753/438000 | Loss: 0.053138 | LR: 1.66e-04 | Speed: 12.7 steps/s | ETA: 9.5h
Epoch   24 | Avg Loss: 0.045266 | LR: 1.71e-04 | Time: 2.9s | Samples: 6,983

Training Flow Model:   0%|          | 26/6000 [02:22<4:41:52,  2.83s/it]Epoch   25 | Step   1826/438000 | Loss: 0.045704 | LR: 1.71e-04 | Speed: 13.0 steps/s | ETA: 9.3h
Epoch   25 | Avg Loss: 0.044707 | LR: 1.77e-04 | Time: 2.7s | Samples: 6,983

Training Flow Model:   0%|          | 27/6000 [02:25<4:38:38,  2.80s/it]Epoch   26 | Step   1899/438000 | Loss: 0.052826 | LR: 1.77e-04 | Speed: 13.2 steps/s | ETA: 9.1h
Epoch   26 | Avg Loss: 0.041951 | LR: 1.82e-04 | Time: 2.7s | Samples: 6,983

Training Flow Model:   0%|          | 28/6000 [02:28<4:41:26,  2.83s/it]Epoch   27 | Step   1972/438000 | Loss: 0.030554 | LR: 1.82e-04 | Speed: 13.5 steps/s | ETA: 9.0h
Epoch   27 | Avg Loss: 0.044097 | LR: 1.87e-04 | Time: 2.9s | Samples: 6,983

Training Flow Model:   0%|          | 29/6000 [02:31<4:41:46,  2.83s/it]Epoch   28 | Step   2045/438000 | Loss: 0.036556 | LR: 1.87e-04 | Speed: 13.7 steps/s | ETA: 8.8h
Epoch   28 | Avg Loss: 0.043588 | LR: 1.92e-04 | Time: 2.8s | Samples: 6,983

Training Flow Model:   0%|          | 30/6000 [02:34<5:10:48,  3.12s/it]Epoch   29 | Step   2118/438000 | Loss: 0.036764 | LR: 1.92e-04 | Speed: 13.9 steps/s | ETA: 8.7h
Epoch   29 | Avg Loss: 0.042376 | LR: 1.98e-04 | Time: 3.8s | Samples: 6,983

Training Flow Model:   1%|          | 31/6000 [02:38<5:33:49,  3.36s/it]Epoch   30 | Step   2191/438000 | Loss: 0.034607 | LR: 1.98e-04 | Speed: 14.1 steps/s | ETA: 8.6h
Epoch   30 | Avg Loss: 0.039175 | LR: 2.03e-04 | Time: 3.9s | Samples: 6,983

Training Flow Model:   1%|          | 32/6000 [02:42<5:52:54,  3.55s/it]Epoch   31 | Step   2264/438000 | Loss: 0.026377 | LR: 2.03e-04 | Speed: 14.2 steps/s | ETA: 8.5h
Epoch   31 | Avg Loss: 0.041455 | LR: 2.08e-04 | Time: 4.0s | Samples: 6,983

Training Flow Model:   1%|          | 33/6000 [02:46<6:02:28,  3.64s/it]Epoch   32 | Step   2337/438000 | Loss: 0.043802 | LR: 2.08e-04 | Speed: 14.3 steps/s | ETA: 8.5h
Epoch   32 | Avg Loss: 0.040566 | LR: 2.13e-04 | Time: 3.9s | Samples: 6,983

Training Flow Model:   1%|          | 34/6000 [02:50<6:11:21,  3.73s/it]Epoch   33 | Step   2410/438000 | Loss: 0.041541 | LR: 2.14e-04 | Speed: 14.4 steps/s | ETA: 8.4h
Epoch   33 | Avg Loss: 0.038954 | LR: 2.19e-04 | Time: 3.9s | Samples: 6,983

Training Flow Model:   1%|          | 35/6000 [02:54<6:14:18,  3.77s/it]Epoch   34 | Step   2483/438000 | Loss: 0.040879 | LR: 2.19e-04 | Speed: 14.5 steps/s | ETA: 8.4h
Epoch   34 | Avg Loss: 0.041221 | LR: 2.24e-04 | Time: 3.8s | Samples: 6,983

Training Flow Model:   1%|          | 36/6000 [02:58<6:20:05,  3.82s/it]Epoch   35 | Step   2556/438000 | Loss: 0.043876 | LR: 2.24e-04 | Speed: 14.6 steps/s | ETA: 8.3h
Epoch   35 | Avg Loss: 0.039926 | LR: 2.29e-04 | Time: 4.0s | Samples: 6,983

Training Flow Model:   1%|          | 37/6000 [03:02<6:24:48,  3.87s/it]Epoch   36 | Step   2629/438000 | Loss: 0.047236 | LR: 2.29e-04 | Speed: 14.7 steps/s | ETA: 8.2h
Epoch   36 | Avg Loss: 0.043514 | LR: 2.34e-04 | Time: 4.0s | Samples: 6,983

Training Flow Model:   1%|          | 38/6000 [03:06<6:26:14,  3.89s/it]Epoch   37 | Step   2702/438000 | Loss: 0.030528 | LR: 2.35e-04 | Speed: 14.7 steps/s | ETA: 8.2h
Epoch   37 | Avg Loss: 0.037676 | LR: 2.40e-04 | Time: 3.9s | Samples: 6,983

Training Flow Model:   1%|          | 39/6000 [03:10<6:23:45,  3.86s/it]Epoch   38 | Step   2775/438000 | Loss: 0.045154 | LR: 2.40e-04 | Speed: 14.8 steps/s | ETA: 8.1h
Epoch   38 | Avg Loss: 0.039012 | LR: 2.45e-04 | Time: 3.8s | Samples: 6,983

Training Flow Model:   1%|          | 40/6000 [03:13<6:24:55,  3.88s/it]Epoch   39 | Step   2848/438000 | Loss: 0.041152 | LR: 2.45e-04 | Speed: 14.9 steps/s | ETA: 8.1h
Epoch   39 | Avg Loss: 0.037944 | LR: 2.50e-04 | Time: 3.9s | Samples: 6,983

Training Flow Model:   1%|          | 41/6000 [03:17<6:23:10,  3.86s/it]Epoch   40 | Step   2921/438000 | Loss: 0.031573 | LR: 2.50e-04 | Speed: 15.0 steps/s | ETA: 8.1h
Epoch   40 | Avg Loss: 0.037019 | LR: 2.55e-04 | Time: 3.8s | Samples: 6,983

Training Flow Model:   1%|          | 42/6000 [03:21<6:24:07,  3.87s/it]Epoch   41 | Step   2994/438000 | Loss: 0.031375 | LR: 2.56e-04 | Speed: 15.1 steps/s | ETA: 8.0h
Epoch   41 | Avg Loss: 0.036788 | LR: 2.61e-04 | Time: 3.9s | Samples: 6,983

Training Flow Model:   1%|          | 43/6000 [03:25<6:23:46,  3.87s/it]Epoch   42 | Step   3067/438000 | Loss: 0.025271 | LR: 2.61e-04 | Speed: 15.1 steps/s | ETA: 8.0h
Epoch   42 | Avg Loss: 0.038254 | LR: 2.66e-04 | Time: 3.9s | Samples: 6,983

Training Flow Model:   1%|          | 44/6000 [03:29<6:27:10,  3.90s/it]Epoch   43 | Step   3140/438000 | Loss: 0.059067 | LR: 2.66e-04 | Speed: 15.2 steps/s | ETA: 7.9h
Epoch   43 | Avg Loss: 0.037138 | LR: 2.71e-04 | Time: 4.0s | Samples: 6,983

Training Flow Model:   1%|          | 45/6000 [03:33<6:24:59,  3.88s/it]Epoch   44 | Step   3213/438000 | Loss: 0.042951 | LR: 2.71e-04 | Speed: 15.3 steps/s | ETA: 7.9h
Epoch   44 | Avg Loss: 0.039265 | LR: 2.77e-04 | Time: 3.8s | Samples: 6,983

Training Flow Model:   1%|          | 46/6000 [03:37<6:24:33,  3.88s/it]Epoch   45 | Step   3286/438000 | Loss: 0.058999 | LR: 2.77e-04 | Speed: 15.3 steps/s | ETA: 7.9h
Epoch   45 | Avg Loss: 0.036169 | LR: 2.82e-04 | Time: 3.9s | Samples: 6,983

Training Flow Model:   1%|          | 47/6000 [03:41<6:24:28,  3.88s/it]Epoch   46 | Step   3359/438000 | Loss: 0.029517 | LR: 2.82e-04 | Speed: 15.4 steps/s | ETA: 7.8h
Epoch   46 | Avg Loss: 0.037829 | LR: 2.87e-04 | Time: 3.9s | Samples: 6,983

Training Flow Model:   1%|          | 48/6000 [03:45<6:28:07,  3.91s/it]Epoch   47 | Step   3432/438000 | Loss: 0.037272 | LR: 2.87e-04 | Speed: 15.5 steps/s | ETA: 7.8h
Epoch   47 | Avg Loss: 0.038144 | LR: 2.92e-04 | Time: 4.0s | Samples: 6,983

Training Flow Model:   1%|          | 49/6000 [03:48<6:27:19,  3.91s/it]Epoch   48 | Step   3505/438000 | Loss: 0.036242 | LR: 2.92e-04 | Speed: 15.5 steps/s | ETA: 7.8h
Epoch   48 | Avg Loss: 0.034156 | LR: 2.98e-04 | Time: 3.9s | Samples: 6,983