JeffrinSam commited on
Commit
c06af61
·
verified ·
1 Parent(s): 0c25d41

Upload config.yaml with huggingface_hub

Browse files
Files changed (1) hide show
  1. config.yaml +793 -0
config.yaml ADDED
@@ -0,0 +1,793 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ checkpoint:
2
+ broadcast_via_filesystem: 'False'
3
+ dcp_allow_mismatched_size: 'False'
4
+ dcp_async_mode_enabled: 'False'
5
+ jit:
6
+ device: cuda
7
+ dtype: bfloat16
8
+ enabled: 'False'
9
+ input_shape: null
10
+ strict: 'True'
11
+ keys_not_to_resume: []
12
+ load_ema_to_reg: 'False'
13
+ load_from_object_store:
14
+ bucket: bucket
15
+ credentials: credentials/s3_checkpoint.secret
16
+ enabled: 'False'
17
+ load_path: /home/jupyter/project/jefpredict/hf_cache/hub/models--nvidia--Cosmos-Predict2.5-2B/snapshots/15a82a2ec231bc318692aa0456a36537c806e7d4/base/pre-trained/d20b7120-df3e-4911-919d-db6e08bad31c_ema_bf16.pt
18
+ load_training_state: 'False'
19
+ only_load_scheduler_state: 'False'
20
+ save_iter: '200'
21
+ save_to_object_store:
22
+ bucket: bucket
23
+ credentials: credentials/s3_checkpoint.secret
24
+ enabled: 'False'
25
+ strict_resume: 'True'
26
+ type:
27
+ _target_: <class 'cosmos_predict2._src.predict2.checkpointer.dcp.DistributedCheckpointer'>
28
+ callbacks: null
29
+ disable_async: 'False'
30
+ verbose: 'True'
31
+ dataloader_train:
32
+ _target_: <function get_generic_dataloader at 0x7f33ca061360>
33
+ batch_size: '1'
34
+ dataloaders:
35
+ image_data:
36
+ dataloader:
37
+ _target_: <function get_cached_replay_dataloader at 0x7f34be2496c0>
38
+ batch_size: '12'
39
+ cache_augment_fn: functools.partial(<function duplicate_batches at 0x7f34be2489d0>,
40
+ n=1)
41
+ cache_replay_name: image_dataloader
42
+ cache_size: '8'
43
+ concat_size: '1'
44
+ dataset:
45
+ _target_: <function get_image_dataset at 0x7f34be249870>
46
+ augmentor_name: image_basic_augmentor_without_embeddings
47
+ caption_type: qwen2p5_7b_v4
48
+ dataset_resolution_type: gt720p
49
+ embedding_type: null
50
+ len_t5: '512'
51
+ resolution: '720'
52
+ t5_dim: '1024'
53
+ num_workers: '6'
54
+ pin_memory: 'True'
55
+ shuffle: 'False'
56
+ use_cache: 'False'
57
+ webdataset: 'False'
58
+ ratio: '1'
59
+ video_data:
60
+ dataloader:
61
+ _target_: <function get_cached_replay_dataloader at 0x7f34be2496c0>
62
+ batch_size: '1'
63
+ cache_augment_fn: functools.partial(<function duplicate_batches_random at
64
+ 0x7f34be249090>, n=1.8)
65
+ cache_replay_name: video_dataloader
66
+ cache_size: '16'
67
+ concat_size: '1'
68
+ dataset:
69
+ _target_: <function get_video_dataset at 0x7f34be24a200>
70
+ augmentor_name: video_basic_augmentor_v2
71
+ caption_type: t2w_qwen2p5_7b
72
+ dataset_resolution_type: all
73
+ embedding_type: null
74
+ len_t5: '512'
75
+ max_fps_thres: '60'
76
+ min_fps_thres: '10'
77
+ num_video_frames: '93'
78
+ resolution: '720'
79
+ t5_dim: '1024'
80
+ use_native_fps: 'True'
81
+ video_decoder_name: video_naive_bytes
82
+ num_workers: '8'
83
+ pin_memory: 'True'
84
+ shuffle: 'False'
85
+ use_cache: 'False'
86
+ webdataset: 'False'
87
+ ratio: '3'
88
+ dataset:
89
+ _target_: <class 'cosmos_predict2._src.predict2.datasets.local_datasets.dataset_video.VideoDataset'>
90
+ dataset_dir: datasets/cosmos_nemo_assets
91
+ num_frames: '93'
92
+ video_size:
93
+ - '704'
94
+ - '1280'
95
+ drop_last: 'True'
96
+ num_workers: '4'
97
+ persistent_workers: 'False'
98
+ pin_memory: 'True'
99
+ prefetch_factor: null
100
+ sampler:
101
+ _target_: <function get_sampler at 0x7f33c9c9a5f0>
102
+ dataset:
103
+ _target_: <class 'cosmos_predict2._src.predict2.datasets.local_datasets.dataset_video.VideoDataset'>
104
+ dataset_dir: datasets/cosmos_nemo_assets
105
+ num_frames: '93'
106
+ video_size:
107
+ - '704'
108
+ - '1280'
109
+ dataloader_val:
110
+ _target_: <class 'cosmos_predict2._src.predict2.datasets.joint_dataloader.IterativeJointDataLoader'>
111
+ dataloaders:
112
+ image_data:
113
+ dataloader:
114
+ _target_: <function get_cached_replay_dataloader at 0x7f34be2496c0>
115
+ batch_size: '2'
116
+ cache_augment_fn: null
117
+ cache_replay_name: image_dataloader
118
+ cache_size: '32'
119
+ concat_size: '1'
120
+ dataset:
121
+ _target_: <function get_image_dataset at 0x7f34be249870>
122
+ len_t5: '512'
123
+ resolution: '512'
124
+ t5_dim: '1024'
125
+ num_workers: '8'
126
+ pin_memory: 'True'
127
+ shuffle: 'False'
128
+ use_cache: 'False'
129
+ webdataset: 'False'
130
+ ratio: '1'
131
+ video_data:
132
+ dataloader:
133
+ _target_: <function get_cached_replay_dataloader at 0x7f34be2496c0>
134
+ batch_size: '1'
135
+ cache_augment_fn: null
136
+ cache_replay_name: video_dataloader
137
+ cache_size: '32'
138
+ concat_size: '1'
139
+ dataset:
140
+ _target_: <function get_video_dataset at 0x7f34be24a200>
141
+ len_t5: '512'
142
+ num_video_frames: '136'
143
+ resolution: '512'
144
+ t5_dim: '1024'
145
+ num_workers: '8'
146
+ pin_memory: 'True'
147
+ shuffle: 'False'
148
+ use_cache: 'False'
149
+ webdataset: 'False'
150
+ ratio: '1'
151
+ defaults:
152
+ - _self_
153
+ - data_train: mock
154
+ - data_val: mock
155
+ - optimizer: fusedadamw
156
+ - scheduler: lambdalinear
157
+ - model: ddp
158
+ - callbacks: basic
159
+ - net: null
160
+ - conditioner: video_prediction_conditioner
161
+ - ema: power
162
+ - tokenizer: cosmos_tokenizer_causal_cv8x8x8_c16_res720_t121_it121_v1_0
163
+ - checkpoint: s3
164
+ - ckpt_type: dummy
165
+ - experiment: null
166
+ job:
167
+ cluster: null
168
+ group: lora
169
+ name: 2b_cosmos_nemo_assets_lora
170
+ project: cosmos_predict_v2p5
171
+ wandb_mode: disabled
172
+ model:
173
+ _recursive_: 'False'
174
+ _target_: <class 'cosmos_predict2._src.predict2.models.video2world_model_rectified_flow.Video2WorldModelRectifiedFlow'>
175
+ config:
176
+ conditional_frame_timestep: -1.0
177
+ conditional_frames_probs:
178
+ 0: 0.333
179
+ 1: 0.333
180
+ 2: 0.334
181
+ conditioner:
182
+ _target_: <class 'cosmos_predict2._src.predict2.configs.video2world.defaults.conditioner.Video2WorldConditioner'>
183
+ fps:
184
+ _target_: <class 'cosmos_predict2._src.predict2.conditioner.ReMapkey'>
185
+ dropout_rate: '0.0'
186
+ dtype: null
187
+ input_key: fps
188
+ output_key: fps
189
+ padding_mask:
190
+ _target_: <class 'cosmos_predict2._src.predict2.conditioner.ReMapkey'>
191
+ dropout_rate: '0.0'
192
+ dtype: null
193
+ input_key: padding_mask
194
+ output_key: padding_mask
195
+ text:
196
+ _target_: <class 'cosmos_predict2._src.predict2.conditioner.TextAttr'>
197
+ credential_path: credentials/s3_training.secret
198
+ dropout_rate: '0.2'
199
+ empty_string_embeddings_path: s3://bucket/predict2_assets/reason1_empty_string_embeddings.pt
200
+ input_key:
201
+ - t5_text_embeddings
202
+ use_empty_string: 'False'
203
+ use_video_condition:
204
+ _target_: <class 'cosmos_predict2._src.predict2.conditioner.BooleanFlag'>
205
+ dropout_rate: '0.0'
206
+ input_key: fps
207
+ output_key: use_video_condition
208
+ conditioning_strategy: frame_replace
209
+ denoise_replace_gt_frames: true
210
+ ema:
211
+ enabled: true
212
+ iteration_shift: 0
213
+ rate: 0.1
214
+ fsdp_shard_size: 8
215
+ high_sigma_ratio: 0.05
216
+ high_sigma_timesteps_max: 1000
217
+ high_sigma_timesteps_min: 980
218
+ init_lora_weights: true
219
+ input_caption_key: ai_caption
220
+ input_data_key: video
221
+ input_image_key: images
222
+ lora_alpha: 32
223
+ lora_rank: 32
224
+ lora_target_modules: q_proj,k_proj,v_proj,output_proj,mlp.layer1,mlp.layer2
225
+ max_num_conditional_frames: 2
226
+ min_num_conditional_frames: 0
227
+ net:
228
+ _target_: <class 'cosmos_predict2._src.predict2.networks.minimal_v1_lvg_dit.MinimalV1LVGDiT'>
229
+ adaln_lora_dim: '256'
230
+ atten_backend: minimal_a2a
231
+ concat_padding_mask: 'True'
232
+ crossattn_emb_channels: '1024'
233
+ crossattn_proj_in_channels: '100352'
234
+ extra_per_block_abs_pos_emb: 'False'
235
+ in_channels: '16'
236
+ max_frames: '128'
237
+ max_img_h: '240'
238
+ max_img_w: '240'
239
+ model_channels: '2048'
240
+ num_blocks: '28'
241
+ num_heads: '16'
242
+ out_channels: '16'
243
+ patch_spatial: '2'
244
+ patch_temporal: '1'
245
+ pos_emb_cls: rope3d
246
+ pos_emb_interpolation: crop
247
+ pos_emb_learnable: 'True'
248
+ rope_enable_fps_modulation: 'False'
249
+ rope_h_extrapolation_ratio: '3.0'
250
+ rope_t_extrapolation_ratio: '1.0'
251
+ rope_w_extrapolation_ratio: '3.0'
252
+ sac_config:
253
+ every_n_blocks: 1
254
+ mode: predict2_2b_720_aggressive
255
+ timestep_scale: '0.001'
256
+ use_adaln_lora: 'True'
257
+ use_crossattn_projection: 'True'
258
+ use_wan_fp32_strategy: 'True'
259
+ precision: bfloat16
260
+ resolution: '720'
261
+ shift: 5
262
+ state_ch: 16
263
+ state_t: 24
264
+ text_encoder_class: reason1p1_7B
265
+ text_encoder_config:
266
+ ckpt_path: s3://bucket/cosmos_reasoning1/sft_exp700/sft_exp721-1_qwen7b_tl_721_5vs5_s3_balanced_n32_resume_16k/checkpoints/iter_000016000/model/
267
+ compute_online: true
268
+ embedding_concat_strategy: full_concat
269
+ model_config:
270
+ _target_: <class 'cosmos_predict2._src.predict2.text_encoders.reason1.QwenVLBaseModel'>
271
+ model_config:
272
+ _target_: cosmos_predict2._src.reason1.configs.default.model_config_qwen.QwenModelConfig
273
+ activation_checkpoint:
274
+ mode: selective
275
+ models: vlm
276
+ selective_ac_option: op
277
+ add_answer_tag: 'True'
278
+ add_cross_attention: 'False'
279
+ add_image_start_end_tag: 'False'
280
+ add_tile_tag: 'False'
281
+ architectures:
282
+ - Qwen2_5_VLForConditionalGeneration
283
+ attention_dropout: '0.0'
284
+ attn_implementation: flash_attention_2
285
+ attn_implementation_autoset: 'True'
286
+ aux_loss_coeff: '0.0'
287
+ bad_words_ids: null
288
+ begin_suppress_tokens: null
289
+ bos_token_id: '151643'
290
+ cache_dir: null
291
+ checkpoint:
292
+ async_mode: disabled
293
+ create_seed_checkpoint: false
294
+ enable_checkpoint: false
295
+ export_dtype: float32
296
+ folder: checkpoint
297
+ interval: 500
298
+ interval_type: steps
299
+ model_weights_only: false
300
+ chunk_size_feed_forward: '0'
301
+ ckpt_dir: null
302
+ ckpt_path: null
303
+ comm:
304
+ init_timeout_seconds: 300
305
+ trace_buf_size: 20000
306
+ train_timeout_seconds: 100
307
+ cp_size: null
308
+ cross_attention_hidden_size: null
309
+ decoder_start_token_id: null
310
+ deterministic: 'False'
311
+ diversity_penalty: '0.0'
312
+ do_sample: 'False'
313
+ early_stopping: 'False'
314
+ encoder_no_repeat_ngram_size: '0'
315
+ eos_token_id: '151645'
316
+ ep_size: null
317
+ experimental:
318
+ enable_async_tensor_parallel: false
319
+ enable_compiled_autograd: false
320
+ pipeline_parallel_degree: 1
321
+ exponential_decay_length_penalty: null
322
+ finetuning_task: null
323
+ float8:
324
+ enable_float8_linear: false
325
+ forced_bos_token_id: null
326
+ forced_eos_token_id: null
327
+ freeze_llm: 'False'
328
+ freeze_mm_projector: 'False'
329
+ freeze_vision_encoder: 'False'
330
+ fsdp_enabled: 'False'
331
+ hidden_act: silu
332
+ hidden_size: '3584'
333
+ id2label:
334
+ 0: LABEL_0
335
+ 1: LABEL_1
336
+ image_token_id: '151655'
337
+ initializer_range: '0.02'
338
+ intermediate_size: '18944'
339
+ is_decoder: 'False'
340
+ is_encoder_decoder: 'False'
341
+ label2id:
342
+ LABEL_0: '0'
343
+ LABEL_1: '1'
344
+ length_penalty: '1.0'
345
+ loss_per_token: 'True'
346
+ max_batch_size: '1'
347
+ max_length: '20'
348
+ max_position_embeddings: '128000'
349
+ max_seq_len: '128000'
350
+ max_window_layers: '28'
351
+ min_length: '0'
352
+ mm_projector: null
353
+ model_type: qwen2_5_vl
354
+ name_or_path: Qwen/Qwen2.5-VL-7B-Instruct
355
+ no_repeat_ngram_size: '0'
356
+ num_attention_heads: '28'
357
+ num_beam_groups: '1'
358
+ num_beams: '1'
359
+ num_hidden_layers: '28'
360
+ num_key_value_heads: '4'
361
+ num_return_sequences: '1'
362
+ num_tiles: '1'
363
+ optimizer:
364
+ early_step_in_backward: false
365
+ end_lr: 2.5e-05
366
+ fused: false
367
+ init_lr: 1.0e-05
368
+ lr: 0.0003
369
+ lr_multiplier_llm: 1.0
370
+ lr_multiplier_mm_projector: 1.0
371
+ lr_multiplier_vision_encoder: 0.1
372
+ name: AdamW
373
+ output_attentions: 'False'
374
+ output_hidden_states: 'True'
375
+ output_scores: 'False'
376
+ pad_token_id: null
377
+ precision: bfloat16
378
+ prefix: null
379
+ prepend_padding: 'False'
380
+ problem_type: null
381
+ pruned_heads: _Nothing.NOTHING
382
+ remove_invalid_values: 'False'
383
+ repetition_penalty: '1.0'
384
+ return_dict: 'True'
385
+ return_dict_in_generate: 'False'
386
+ rms_norm_eps: 1e-06
387
+ rope_scaling:
388
+ mrope_section:
389
+ - '16'
390
+ - '24'
391
+ - '24'
392
+ rope_type: default
393
+ type: default
394
+ rope_theta: '1000000.0'
395
+ s3_credential_path: credentials/pbss_dir.secret
396
+ seed: '0'
397
+ sep_token_id: null
398
+ sliding_window: '32768'
399
+ suppress_tokens: null
400
+ task_specific_params: null
401
+ temperature: '1.0'
402
+ tf_legacy_loss: 'False'
403
+ tie_encoder_decoder: 'False'
404
+ tie_word_embeddings: 'False'
405
+ tile_tag_type: space_separated
406
+ tokenizer_class: null
407
+ tokenizer_type: Qwen/Qwen2.5-VL-7B-Instruct
408
+ top_k: '50'
409
+ top_p: '1.0'
410
+ torch_dtype: bfloat16
411
+ torchscript: 'False'
412
+ training:
413
+ compile: false
414
+ context_parallel_degree: 1
415
+ data_parallel_replicate_degree: 1
416
+ data_parallel_shard_degree: -1
417
+ disable_loss_parallel: false
418
+ enable_cpu_offload: false
419
+ fsdp_reshard_after_forward: default
420
+ mixed_precision_param: bfloat16
421
+ mixed_precision_reduce: float32
422
+ steps: 400000
423
+ tensor_parallel_degree: 1
424
+ use_cosine_decay: false
425
+ use_linear_decay: true
426
+ warmup_steps: 1000
427
+ training_seq_len: '4096'
428
+ transformers_version: 4.51.0.dev0
429
+ typical_p: '1.0'
430
+ use_bfloat16: 'False'
431
+ use_cache: 'False'
432
+ use_fsdp2: 'True'
433
+ use_return_dict: 'True'
434
+ use_rope_from_torchtitan: 'False'
435
+ use_sliding_window: 'False'
436
+ video_token_id: '151656'
437
+ vision_config:
438
+ _target_: cosmos_predict2._src.reason1.configs.default.model_config_qwen.QwenVisionConfig
439
+ add_cross_attention: 'False'
440
+ architectures: null
441
+ attn_implementation: flash_attention_2
442
+ attn_implementation_autoset: 'True'
443
+ bad_words_ids: null
444
+ begin_suppress_tokens: null
445
+ bos_token_id: null
446
+ chunk_size_feed_forward: '0'
447
+ cross_attention_hidden_size: null
448
+ decoder_start_token_id: null
449
+ depth: '32'
450
+ diversity_penalty: '0.0'
451
+ do_sample: 'False'
452
+ early_stopping: 'False'
453
+ embed_dim: null
454
+ encoder_no_repeat_ngram_size: '0'
455
+ eos_token_id: null
456
+ exponential_decay_length_penalty: null
457
+ finetuning_task: null
458
+ forced_bos_token_id: null
459
+ forced_eos_token_id: null
460
+ fullatt_block_indexes:
461
+ - '7'
462
+ - '15'
463
+ - '23'
464
+ - '31'
465
+ hidden_act: silu
466
+ hidden_size: '1280'
467
+ id2label:
468
+ 0: LABEL_0
469
+ 1: LABEL_1
470
+ in_channels: '3'
471
+ in_chans: '3'
472
+ intermediate_size: '3420'
473
+ is_decoder: 'False'
474
+ is_encoder_decoder: 'False'
475
+ label2id:
476
+ LABEL_0: '0'
477
+ LABEL_1: '1'
478
+ length_penalty: '1.0'
479
+ max_length: '20'
480
+ min_length: '0'
481
+ mlp_ratio: null
482
+ model_type: qwen2_5_vl
483
+ name_or_path: ''
484
+ no_repeat_ngram_size: '0'
485
+ num_beam_groups: '1'
486
+ num_beams: '1'
487
+ num_heads: '16'
488
+ num_return_sequences: '1'
489
+ out_hidden_size: '3584'
490
+ output_attentions: 'False'
491
+ output_hidden_states: 'False'
492
+ output_scores: 'False'
493
+ pad_token_id: null
494
+ patch_size: '14'
495
+ prefix: null
496
+ problem_type: null
497
+ pruned_heads: _Nothing.NOTHING
498
+ remove_invalid_values: 'False'
499
+ repetition_penalty: '1.0'
500
+ return_dict: 'True'
501
+ return_dict_in_generate: 'False'
502
+ sep_token_id: null
503
+ spatial_merge_size: '2'
504
+ spatial_patch_size: '14'
505
+ suppress_tokens: null
506
+ task_specific_params: null
507
+ temperature: '1.0'
508
+ temporal_patch_size: '2'
509
+ tf_legacy_loss: 'False'
510
+ tie_encoder_decoder: 'False'
511
+ tie_word_embeddings: 'True'
512
+ tokenizer_class: null
513
+ tokens_per_second: '2'
514
+ top_k: '50'
515
+ top_p: '1.0'
516
+ torch_dtype: bfloat16
517
+ torchscript: 'False'
518
+ typical_p: '1.0'
519
+ use_bfloat16: 'False'
520
+ window_size: '112'
521
+ vision_encoder: openai/clip-vit-base-patch32
522
+ vision_encoder_config:
523
+ depth_init: true
524
+ dim: 1024
525
+ ffn_dim_multiplier: null
526
+ head_dim: null
527
+ hidden_act: null
528
+ hidden_dim: 4096
529
+ image_size: 1024
530
+ image_token_id: null
531
+ multiple_of: null
532
+ n_heads: 16
533
+ n_kv_heads: null
534
+ n_layers: 24
535
+ norm_eps: 1.0e-05
536
+ norm_type: rmsnorm
537
+ num_channels: 3
538
+ patch_size: 16
539
+ proj_bias: null
540
+ qkv_bias: null
541
+ rope_theta: 10000.0
542
+ use_cache: false
543
+ use_rope_from_torchtitan: false
544
+ vision_encoder_in_channels: '3'
545
+ vision_end_token_id: '151653'
546
+ vision_start_token_id: '151652'
547
+ vision_token_id: '151654'
548
+ vocab_size: '152064'
549
+ z_loss_coeff: '0.0'
550
+ tokenizer:
551
+ _target_: <function build_tokenizer at 0x7f348bbeff40>
552
+ cache_dir: null
553
+ tokenizer_type: Qwen/Qwen2.5-VL-7B-Instruct
554
+ n_layers_per_group: 5
555
+ s3_credential_path: credentials/s3_checkpoint.secret
556
+ tokenizer:
557
+ _target_: <class 'cosmos_predict2._src.predict2.tokenizers.wan2pt1.Wan2pt1VAEInterface'>
558
+ chunk_duration: '81'
559
+ load_mean_std: 'False'
560
+ name: wan2pt1_tokenizer
561
+ temporal_window: '16'
562
+ train_time_distribution: logitnormal
563
+ train_time_weight: reweighting
564
+ use_dynamic_shift: false
565
+ use_high_sigma_strategy: false
566
+ use_kerras_sigma_at_inference: false
567
+ use_lora: true
568
+ use_torch_compile: false
569
+ model_parallel:
570
+ _cpu_offloading_context: null
571
+ async_tensor_model_parallel_allreduce: false
572
+ autocast_dtype: torch.float32
573
+ barrier_with_L1_time: true
574
+ batch_p2p_comm: true
575
+ batch_p2p_sync: true
576
+ bf16: false
577
+ context_parallel_size: 1
578
+ cpu_offloading: false
579
+ cpu_offloading_activations: true
580
+ cpu_offloading_num_layers: 0
581
+ cpu_offloading_weights: true
582
+ cross_entropy_fusion_impl: native
583
+ cross_entropy_loss_fusion: false
584
+ deallocate_pipeline_outputs: false
585
+ defer_embedding_wgrad_compute: false
586
+ deterministic_mode: false
587
+ enable_autocast: false
588
+ expert_model_parallel_size: 1
589
+ expert_tensor_parallel_size: 1
590
+ finalize_model_grads_func: null
591
+ fp16: false
592
+ grad_scale_func: null
593
+ grad_sync_func: null
594
+ gradient_accumulation_fusion: false
595
+ hierarchical_context_parallel_sizes: null
596
+ microbatch_group_size_per_vp_stage: 1
597
+ moe_extended_tp: false
598
+ no_sync_func: null
599
+ num_microbatches_with_partial_activation_checkpoints: null
600
+ overlap_p2p_comm: false
601
+ overlap_p2p_comm_warmup_flush: false
602
+ param_sync_func: null
603
+ params_dtype: torch.float32
604
+ perform_initialization: true
605
+ pipeline_dtype: null
606
+ pipeline_model_parallel_comm_backend: null
607
+ pipeline_model_parallel_size: 1
608
+ pipeline_model_parallel_split_rank: null
609
+ sequence_parallel: false
610
+ tensor_model_parallel_size: 1
611
+ timers: null
612
+ tp_comm_atomic_ag: false
613
+ tp_comm_atomic_rs: false
614
+ tp_comm_bootstrap_backend: nccl
615
+ tp_comm_bulk_dgrad: true
616
+ tp_comm_bulk_wgrad: true
617
+ tp_comm_overlap: false
618
+ tp_comm_overlap_ag: true
619
+ tp_comm_overlap_disable_fc1: false
620
+ tp_comm_overlap_disable_qkv: false
621
+ tp_comm_overlap_rs: true
622
+ tp_comm_overlap_rs_dgrad: false
623
+ tp_comm_split_ag: true
624
+ tp_comm_split_rs: true
625
+ use_cpu_initialization: false
626
+ use_ring_exchange_p2p: false
627
+ use_te_rng_tracker: false
628
+ variable_seq_lengths: false
629
+ virtual_pipeline_model_parallel_size: null
630
+ wgrad_deferral_limit: 0
631
+ optimizer:
632
+ _target_: <function get_base_optimizer at 0x7f343a025f30>
633
+ betas:
634
+ - '0.9'
635
+ - '0.999'
636
+ eps: 1e-08
637
+ fused: 'True'
638
+ lr: '4.315837287515549e-05'
639
+ model: null
640
+ optim_type: adamw
641
+ weight_decay: '0.001'
642
+ scheduler:
643
+ _target_: <class 'cosmos_predict2._src.common.functional.lr_scheduler.LambdaLinearScheduler'>
644
+ cycle_lengths:
645
+ - '100000'
646
+ f_max:
647
+ - '0.5'
648
+ f_min:
649
+ - '0.2'
650
+ f_start:
651
+ - 1e-06
652
+ verbosity_interval: '0'
653
+ warm_up_steps:
654
+ - '2000'
655
+ trainer:
656
+ callbacks:
657
+ compile_tokenizer:
658
+ _target_: <class 'cosmos_predict2._src.predict2.callbacks.compile_tokenizer.CompileTokenizer'>
659
+ compile_after_iterations: '4'
660
+ dynamic: 'False'
661
+ enabled: 'True'
662
+ dataloader_speed:
663
+ _target_: <class 'cosmos_predict2._src.predict2.callbacks.dataloading_monitor.DetailedDataLoadingSpeedMonitor'>
664
+ every_n: '100'
665
+ save_s3: 'False'
666
+ step_size: '1'
667
+ device_monitor:
668
+ _target_: <class 'cosmos_predict2._src.predict2.callbacks.device_monitor.DeviceMonitor'>
669
+ every_n: '100'
670
+ log_memory_detail: 'True'
671
+ save_s3: 'False'
672
+ step_size: '1'
673
+ upload_every_n_mul: '10'
674
+ every_n_sample_ema:
675
+ _target_: <class 'cosmos_predict2._src.predict2.callbacks.every_n_draw_sample.EveryNDrawSample'>
676
+ do_x0_prediction: 'False'
677
+ every_n: '200'
678
+ fps: '16'
679
+ guidance:
680
+ - '0.0'
681
+ - '3.0'
682
+ - '7.0'
683
+ is_ema: 'True'
684
+ n_sample_to_save: '128'
685
+ n_sigmas_for_x0_prediction: '4'
686
+ n_viz_sample: '3'
687
+ num_sampling_step: '35'
688
+ prompt_type: t5_xxl
689
+ save_s3: 'False'
690
+ step_size: '1'
691
+ use_negative_prompt: 'False'
692
+ every_n_sample_reg:
693
+ _target_: <class 'cosmos_predict2._src.predict2.callbacks.every_n_draw_sample.EveryNDrawSample'>
694
+ do_x0_prediction: 'False'
695
+ every_n: '200'
696
+ fps: '16'
697
+ guidance:
698
+ - '0.0'
699
+ - '3.0'
700
+ - '7.0'
701
+ is_ema: 'False'
702
+ n_sample_to_save: '128'
703
+ n_sigmas_for_x0_prediction: '4'
704
+ n_viz_sample: '3'
705
+ num_sampling_step: '35'
706
+ prompt_type: t5_xxl
707
+ save_s3: 'False'
708
+ step_size: '1'
709
+ use_negative_prompt: 'False'
710
+ grad_clip:
711
+ _target_: <class 'cosmos_predict2._src.predict2.callbacks.grad_clip.GradClip'>
712
+ clip_norm: '0.1'
713
+ force_finite: 'True'
714
+ heart_beat:
715
+ _target_: <class 'cosmos_predict2._src.predict2.callbacks.heart_beat.HeartBeat'>
716
+ every_n: '10'
717
+ save_s3: 'False'
718
+ step_size: '1'
719
+ update_interval_in_minute: '20'
720
+ iter_speed:
721
+ _target_: <class 'cosmos_predict2._src.predict2.callbacks.iter_speed.IterSpeed'>
722
+ every_n: '100'
723
+ hit_thres: '200'
724
+ save_s3: 'False'
725
+ save_s3_every_log_n: '10'
726
+ low_prec:
727
+ _target_: <class 'cosmos_predict2._src.imaginaire.utils.callback.LowPrecisionCallback'>
728
+ config: null
729
+ trainer: null
730
+ update_iter: '1'
731
+ manual_gc:
732
+ _target_: <class 'cosmos_predict2._src.imaginaire.callbacks.manual_gc.ManualGarbageCollection'>
733
+ every_n: '200'
734
+ warm_up: '5'
735
+ wandb:
736
+ _target_: <class 'cosmos_predict2._src.predict2.callbacks.wandb_log.WandbCallback'>
737
+ logging_iter_multipler: '1'
738
+ save_logging_iter_multipler: '10'
739
+ save_s3: 'False'
740
+ wandb_10x:
741
+ _target_: <class 'cosmos_predict2._src.predict2.callbacks.wandb_log.WandbCallback'>
742
+ logging_iter_multipler: '10'
743
+ save_logging_iter_multipler: '1'
744
+ save_s3: 'False'
745
+ cudnn:
746
+ benchmark: 'True'
747
+ deterministic: 'False'
748
+ ddp:
749
+ broadcast_buffers: 'True'
750
+ find_unused_parameters: 'False'
751
+ static_graph: 'True'
752
+ distributed_parallelism: fsdp
753
+ grad_accum_iter: '1'
754
+ grad_scaler_args:
755
+ enabled: 'False'
756
+ logging_iter: '100'
757
+ max_iter: '1000'
758
+ max_val_iter: null
759
+ memory_format: torch.preserve_format
760
+ profiling:
761
+ enable_memory_snapshot: 'False'
762
+ enable_profiling: 'False'
763
+ profile_freq: '1'
764
+ profile_memory: 'False'
765
+ record_shape: 'False'
766
+ save_s3: 'False'
767
+ target_ranks:
768
+ - '0'
769
+ - '1'
770
+ - '2'
771
+ - '3'
772
+ - '4'
773
+ - '5'
774
+ - '6'
775
+ - '7'
776
+ with_modules: 'True'
777
+ with_stack: 'True'
778
+ run_validation: 'False'
779
+ seed: '0'
780
+ straggler_detection:
781
+ analyze_backward: 'True'
782
+ analyze_dataloading: 'True'
783
+ analyze_forward: 'True'
784
+ analyze_optimizer: 'True'
785
+ enabled: 'False'
786
+ max_diff: '1.5'
787
+ profile_freq: '1'
788
+ raise_error: 'True'
789
+ report_freq: '100'
790
+ timeout_period: '999999999'
791
+ type: <class 'cosmos_predict2._src.imaginaire.trainer.ImaginaireTrainer'>
792
+ validation_iter: '100'
793
+ upload_reproducible_setup: 'True'