KonstantinGarbers commited on
Commit
a282d10
·
verified ·
1 Parent(s): e03f77a

Upload vit-wee__k4-4_a0.1-0.5_soft-ce_as-cosine_scheduling to labelmix_labelmix_vit_wee_patch16_reg1_gap_256_imagenet-1k_seed0_a0.1-0.5_cosine_k4_fixed_asp20.0_px0

Browse files
labelmix_labelmix_vit_wee_patch16_reg1_gap_256_imagenet-1k_seed0_a0.1-0.5_cosine_k4_fixed_asp20.0_px0/README.md ADDED
@@ -0,0 +1,49 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: timm
3
+ tags:
4
+ - image-classification
5
+ - labelmix
6
+ ---
7
+
8
+ # labelmix/labelmix_vit_wee_patch16_reg1_gap_256_imagenet-1k_seed0_a0.1-0.5_cosine_k4_fixed_asp20.0_px0
9
+
10
+ This model is part of the [KonstantinGarbers/labelmix](https://huggingface.co/KonstantinGarbers/labelmix) repository.
11
+
12
+ ## Hyperparameters
13
+
14
+ | Key | Value |
15
+ |-----|-------|
16
+ | `model` | `vit_wee_patch16_reg1_gap_256` |
17
+ | `dataset` | `hfds/ILSVRC/imagenet-1k` |
18
+ | `seed` | `0` |
19
+ | `labelmix_alpha_min` | `0.1` |
20
+ | `labelmix_alpha_max` | `0.5` |
21
+ | `labelmix_reverse` | `False` |
22
+ | `labelmix_schedule` | `cosine` |
23
+ | `labelmix_k_min` | `4` |
24
+ | `labelmix_k_max` | `4` |
25
+ | `labelmix_k_reverse` | `False` |
26
+ | `labelmix_k_schedule` | `fixed` |
27
+ | `labelmix_sampling_max_aspect` | `20.0` |
28
+ | `labelmix_sampling_min_side_px` | `0` |
29
+ | `experiment` | `vit-wee__k4-4_a0.1-0.5_soft-ce_as-cosine_scheduling` |
30
+ | `labelmix` | `True` |
31
+ | `labelmix_loss` | `soft_ce` |
32
+ | `labelmix_mix_k` | `4` |
33
+
34
+ ## Run Status
35
+
36
+ - **status**: finished
37
+ - **finished**: True
38
+ - **start_step**: 0
39
+ - **wandb_id**: 7vx2d970
40
+
41
+ ## Note on Epoch Arguments
42
+
43
+ > **Disclaimer:** The epoch-related arguments inside `args.yaml` (e.g. `epochs`, `warmup_epochs`) do **not** correspond to the actual number of epochs trained. Training duration is controlled via optimizer steps — see `num_steps` and `warmup_steps`. To convert these steps to epochs use:
44
+ > ```
45
+ > epochs = (num_steps + warmup_steps) / (int(balanced_mode) * num_classes / batch_size)
46
+ > ```
47
+ > where `batch_size` is the global batch size (default 1024).
48
+
49
+ *Run name: `vit-wee__k4-4_a0.1-0.5_soft-ce_as-cosine_scheduling`*
labelmix_labelmix_vit_wee_patch16_reg1_gap_256_imagenet-1k_seed0_a0.1-0.5_cosine_k4_fixed_asp20.0_px0/args.yaml ADDED
@@ -0,0 +1,224 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ aa: rand-m6-inc1-mstd1.0-n3
2
+ amp: true
3
+ amp_dtype: bfloat16
4
+ amp_impl: native
5
+ aug_repeats: 0.0
6
+ aug_splits: 0
7
+ balanced_buffer: 4096
8
+ balanced_buffer_steps: '4'
9
+ balanced_cache_path: ''
10
+ balanced_cache_threshold: 256
11
+ balanced_cache_threshold_steps: '3'
12
+ balanced_input_key: null
13
+ balanced_mode: '1280'
14
+ balanced_target_key: null
15
+ batch_size: 512
16
+ bce_loss: false
17
+ bce_pos_weight: null
18
+ bce_sum: false
19
+ bce_target_thresh: null
20
+ bn_eps: null
21
+ bn_momentum: null
22
+ channels_last: false
23
+ check_resume: true
24
+ check_resume_search_limit: 30
25
+ checkpoint_hist: 10
26
+ class_map: ''
27
+ clip_grad: null
28
+ clip_mode: norm
29
+ color_jitter: null
30
+ color_jitter_prob: null
31
+ cooldown_epochs: 10
32
+ cooldown_steps: 0
33
+ crop_pct: 0.95
34
+ cutmix: 0.0
35
+ cutmix_minmax: null
36
+ data: null
37
+ data_dir: /dev/shm/imagenet-1k
38
+ dataset: hfds/ILSVRC/imagenet-1k
39
+ dataset_download: false
40
+ dataset_trust_remote_code: false
41
+ decay_epochs: 100
42
+ decay_milestones:
43
+ - 90
44
+ - 180
45
+ - 270
46
+ decay_rate: 0.1
47
+ decay_steps: 90
48
+ device: cuda
49
+ device_modules: null
50
+ dist_bn: reduce
51
+ distill_loss_weight: null
52
+ drop: 0.0
53
+ drop_block: null
54
+ drop_connect: null
55
+ drop_path: 0.2
56
+ epoch_repeats: 0.0
57
+ epochs: 450
58
+ eval_metric: top1
59
+ fast_norm: false
60
+ force_cpu: false
61
+ fuser: ''
62
+ gaussian_blur_prob: null
63
+ gp: null
64
+ grad_accum_steps: 1
65
+ grad_checkpointing: false
66
+ grayscale_prob: null
67
+ head_init_bias: null
68
+ head_init_scale: 0.0
69
+ hflip: 0.5
70
+ img_size: 256
71
+ in_chans: null
72
+ initial_checkpoint: ''
73
+ input_img_mode: null
74
+ input_key: image
75
+ input_size: null
76
+ interpolation: ''
77
+ jsd_loss: false
78
+ kd_distill_type: logit
79
+ kd_loss_type: kl
80
+ kd_model_name: null
81
+ kd_student_feature_dim: null
82
+ kd_teacher_feature_dim: null
83
+ kd_temperature: 4.0
84
+ kd_token_distill_type: soft
85
+ labelmix: true
86
+ labelmix_alpha_max: 0.5
87
+ labelmix_alpha_min: 0.1
88
+ labelmix_k_cooldown_epochs: null
89
+ labelmix_k_max: 4
90
+ labelmix_k_min: 4
91
+ labelmix_k_reverse: false
92
+ labelmix_k_schedule: fixed
93
+ labelmix_k_total_epochs: null
94
+ labelmix_k_warmup_epochs: 0
95
+ labelmix_loss: soft_ce
96
+ labelmix_mix_k: 4
97
+ labelmix_producer_rank: -1
98
+ labelmix_producer_workers: 0
99
+ labelmix_reverse: false
100
+ labelmix_sampling: true
101
+ labelmix_sampling_bins: 16
102
+ labelmix_sampling_low_watermark: 32
103
+ labelmix_sampling_max_aspect: 20.0
104
+ labelmix_sampling_max_attempts: 200
105
+ labelmix_sampling_min_side_px: 0
106
+ labelmix_sampling_pool_size: 128
107
+ labelmix_schedule: cosine
108
+ labelmix_step_mode: total
109
+ labelmix_total_epochs: null
110
+ labelmix_total_steps: null
111
+ labelmix_warmup_steps: 0
112
+ layer_decay: null
113
+ layer_decay_min_scale: 0
114
+ layer_decay_no_opt_scale: null
115
+ loader_prefetch_factor: 2
116
+ local_rank: 0
117
+ log_dir: ''
118
+ log_interval: 50
119
+ log_wandb: true
120
+ lr: null
121
+ lr_base: 0.0015
122
+ lr_base_scale: ''
123
+ lr_base_size: 1024
124
+ lr_cycle_decay: 0.5
125
+ lr_cycle_limit: 1
126
+ lr_cycle_mul: 1.0
127
+ lr_k_decay: 1.0
128
+ lr_noise: null
129
+ lr_noise_pct: 0.67
130
+ lr_noise_std: 1.0
131
+ mean: null
132
+ min_lr: 5.0e-07
133
+ mixup: 0.0
134
+ mixup_mode: batch
135
+ mixup_off_epoch: 0
136
+ mixup_off_step: 0
137
+ mixup_prob: 0.0
138
+ mixup_switch_prob: 0.5
139
+ model: vit_wee_patch16_reg1_gap_256
140
+ model_dtype: null
141
+ model_ema: true
142
+ model_ema_decay: 0.9998
143
+ model_ema_force_cpu: false
144
+ model_ema_warmup: true
145
+ model_kwargs:
146
+ fix_init: true
147
+ momentum: 0.9
148
+ naflex_loader: false
149
+ naflex_loss_scale: linear
150
+ naflex_max_seq_len: 576
151
+ naflex_patch_size_probs: null
152
+ naflex_patch_sizes: null
153
+ naflex_train_seq_lens:
154
+ - 128
155
+ - 256
156
+ - 576
157
+ - 784
158
+ - 1024
159
+ no_aug: false
160
+ no_ddp_bb: false
161
+ no_prefetcher: false
162
+ no_resume_opt: false
163
+ num_classes: 1000
164
+ num_evals: 100
165
+ num_logs: 1000
166
+ num_saves: 20
167
+ num_steps: 125000
168
+ opt: nadamw
169
+ opt_betas: null
170
+ opt_eps: 1.0e-08
171
+ opt_kwargs: {}
172
+ patience_epochs: 10
173
+ patience_steps: 10
174
+ pin_mem: true
175
+ pretrained: false
176
+ pretrained_path: null
177
+ ratio:
178
+ - 0.75
179
+ - 1.3333333333333333
180
+ recount: 1
181
+ recovery_interval: 5000
182
+ remode: pixel
183
+ reprob: 0.2
184
+ resplit: false
185
+ save_images: false
186
+ scale:
187
+ - 0.08
188
+ - 1.0
189
+ sched: cosine
190
+ sched_on_updates: true
191
+ seed: 0
192
+ smoothing: 0.1
193
+ split_bn: false
194
+ start_epoch: null
195
+ start_step: null
196
+ std: null
197
+ sync_bn: false
198
+ synchronize_step: false
199
+ target_key: label
200
+ task_loss_weight: null
201
+ torchcompile: inductor
202
+ torchcompile_mode: null
203
+ torchscript: false
204
+ train_crop_mode: null
205
+ train_interpolation: random
206
+ train_num_samples: null
207
+ train_split: train
208
+ training_started_step: 2
209
+ tta: 0
210
+ use_multi_epochs_loader: false
211
+ val_interval: 1
212
+ val_num_samples: null
213
+ val_split: validation
214
+ validation_batch_size: null
215
+ vflip: 0.0
216
+ wandb_project: labelmix
217
+ wandb_tags: []
218
+ warmup_epochs: 20
219
+ warmup_lr: 5.0e-07
220
+ warmup_prefix: true
221
+ warmup_steps: 12500
222
+ weight_decay: 0.06
223
+ worker_seeding: all
224
+ workers: 4
labelmix_labelmix_vit_wee_patch16_reg1_gap_256_imagenet-1k_seed0_a0.1-0.5_cosine_k4_fixed_asp20.0_px0/model_best.pth.tar ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e313c37d292f3f31d9814c176b0d113d99cea907a4595993b4f2e6c510afeaa4
3
+ size 215091611