KonstantinGarbers commited on
Commit
10b4635
·
verified ·
1 Parent(s): 9349115

Upload vit-wee__baseline_seed=42 to labelmix_baseline_vit_wee_patch16_reg1_gap_256_imagenet-1k_seed42

Browse files
labelmix_baseline_vit_wee_patch16_reg1_gap_256_imagenet-1k_seed42/README.md ADDED
@@ -0,0 +1,64 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: timm
3
+ tags:
4
+ - image-classification
5
+ - labelmix
6
+ ---
7
+
8
+ # labelmix/baseline_vit_wee_patch16_reg1_gap_256_imagenet-1k_seed42
9
+
10
+ This model is part of the [KonstantinGarbers/labelmix](https://huggingface.co/KonstantinGarbers/labelmix) repository.
11
+
12
+ ## Hyperparameters
13
+
14
+ | Key | Value |
15
+ |-----|-------|
16
+ | `model` | `vit_wee_patch16_reg1_gap_256` |
17
+ | `dataset` | `hfds/ILSVRC/imagenet-1k` |
18
+ | `seed` | `42` |
19
+ | `labelmix_alpha_min` | `0.1` |
20
+ | `labelmix_alpha_max` | `1.0` |
21
+ | `labelmix_reverse` | `False` |
22
+ | `labelmix_schedule` | `fixed` |
23
+ | `labelmix_k_min` | `None` |
24
+ | `labelmix_k_max` | `None` |
25
+ | `labelmix_k_reverse` | `False` |
26
+ | `labelmix_k_schedule` | `fixed` |
27
+ | `labelmix_sampling_max_aspect` | `10.0` |
28
+ | `labelmix_sampling_min_side_px` | `6` |
29
+ | `experiment` | `vit-wee__baseline_seed=42` |
30
+ | `labelmix` | `False` |
31
+ | `labelmix_loss` | `soft_ce` |
32
+ | `labelmix_mix_k` | `5` |
33
+
34
+ ## Best Validation Metrics
35
+
36
+ | Metric | Value |
37
+ |--------|-------|
38
+ | `epoch` | `108` |
39
+ | `step` | `136125` |
40
+ | `train_loss` | `3.382702350616455` |
41
+ | `train_cross_entropy` | `2.2727866172790527` |
42
+ | `train_grad_norm` | `1.6432865858078003` |
43
+ | `eval_loss` | `1.017465326423645` |
44
+ | `eval_top1` | `76.31400005859375` |
45
+ | `eval_top5` | `93.33000002929687` |
46
+ | `eval_ece` | `9.331677436828613` |
47
+ | `lr` | `9.4763947085907e-07` |
48
+
49
+ ## Run Status
50
+
51
+ - **status**: finished
52
+ - **finished**: True
53
+ - **start_step**: 0
54
+ - **wandb_id**: gw4vrr3w
55
+
56
+ ## Note on Epoch Arguments
57
+
58
+ > **Disclaimer:** The epoch-related arguments inside `args.yaml` (e.g. `epochs`, `warmup_epochs`) do **not** correspond to the actual number of epochs trained. Training duration is controlled via optimizer steps — see `num_steps` and `warmup_steps`. To convert these steps to epochs use:
59
+ > ```
60
+ > epochs = (num_steps + warmup_steps) / (int(balanced_mode) * num_classes / batch_size)
61
+ > ```
62
+ > where `batch_size` is the global batch size (default 1024).
63
+
64
+ *Run name: `vit-wee__baseline_seed=42`*
labelmix_baseline_vit_wee_patch16_reg1_gap_256_imagenet-1k_seed42/args.yaml ADDED
@@ -0,0 +1,223 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ aa: rand-m6-inc1-mstd1.0-n3
2
+ amp: true
3
+ amp_dtype: bfloat16
4
+ amp_impl: native
5
+ aug_repeats: 0.0
6
+ aug_splits: 0
7
+ balanced_buffer: 4096
8
+ balanced_buffer_steps: '4'
9
+ balanced_cache_path: ''
10
+ balanced_cache_threshold: 256
11
+ balanced_cache_threshold_steps: '3'
12
+ balanced_input_key: null
13
+ balanced_mode: '1280'
14
+ balanced_target_key: null
15
+ batch_size: 512
16
+ bce_loss: false
17
+ bce_pos_weight: null
18
+ bce_sum: false
19
+ bce_target_thresh: null
20
+ bn_eps: null
21
+ bn_momentum: null
22
+ channels_last: false
23
+ check_resume: true
24
+ check_resume_search_limit: 30
25
+ checkpoint_hist: 10
26
+ class_map: ''
27
+ clip_grad: null
28
+ clip_mode: norm
29
+ color_jitter: null
30
+ color_jitter_prob: null
31
+ cooldown_epochs: 10
32
+ cooldown_steps: 0
33
+ crop_pct: 0.95
34
+ cutmix: 1.0
35
+ cutmix_minmax: null
36
+ data: null
37
+ data_dir: /dev/shm/imagenet-1k
38
+ dataset: hfds/ILSVRC/imagenet-1k
39
+ dataset_download: false
40
+ dataset_trust_remote_code: false
41
+ decay_epochs: 100
42
+ decay_milestones:
43
+ - 90
44
+ - 180
45
+ - 270
46
+ decay_rate: 0.1
47
+ decay_steps: 90
48
+ device: cuda
49
+ device_modules: null
50
+ dist_bn: reduce
51
+ distill_loss_weight: null
52
+ drop: 0.0
53
+ drop_block: null
54
+ drop_connect: null
55
+ drop_path: 0.2
56
+ epoch_repeats: 0.0
57
+ epochs: 450
58
+ eval_metric: top1
59
+ fast_norm: false
60
+ force_cpu: false
61
+ fuser: ''
62
+ gaussian_blur_prob: null
63
+ gp: null
64
+ grad_accum_steps: 1
65
+ grad_checkpointing: false
66
+ grayscale_prob: null
67
+ head_init_bias: null
68
+ head_init_scale: 0.0
69
+ hflip: 0.5
70
+ img_size: 256
71
+ in_chans: null
72
+ initial_checkpoint: ''
73
+ input_img_mode: null
74
+ input_key: image
75
+ input_size: null
76
+ interpolation: ''
77
+ jsd_loss: false
78
+ kd_distill_type: logit
79
+ kd_loss_type: kl
80
+ kd_model_name: null
81
+ kd_student_feature_dim: null
82
+ kd_teacher_feature_dim: null
83
+ kd_temperature: 4.0
84
+ kd_token_distill_type: soft
85
+ labelmix: false
86
+ labelmix_alpha_max: 1.0
87
+ labelmix_alpha_min: 0.1
88
+ labelmix_k_max: null
89
+ labelmix_k_min: null
90
+ labelmix_k_reverse: false
91
+ labelmix_k_schedule: fixed
92
+ labelmix_k_total_epochs: null
93
+ labelmix_k_warmup_epochs: 0
94
+ labelmix_loss: soft_ce
95
+ labelmix_mix_k: 5
96
+ labelmix_producer_rank: -1
97
+ labelmix_producer_workers: 0
98
+ labelmix_reverse: false
99
+ labelmix_sampling: false
100
+ labelmix_sampling_bins: 16
101
+ labelmix_sampling_low_watermark: 32
102
+ labelmix_sampling_max_aspect: 10.0
103
+ labelmix_sampling_max_attempts: 200
104
+ labelmix_sampling_min_side_px: 6
105
+ labelmix_sampling_pool_size: 128
106
+ labelmix_schedule: fixed
107
+ labelmix_step_mode: total
108
+ labelmix_total_epochs: null
109
+ labelmix_total_steps: null
110
+ labelmix_warmup_steps: 0
111
+ layer_decay: null
112
+ layer_decay_min_scale: 0
113
+ layer_decay_no_opt_scale: null
114
+ loader_prefetch_factor: 2
115
+ local_rank: 0
116
+ log_dir: ''
117
+ log_interval: 50
118
+ log_wandb: true
119
+ lr: null
120
+ lr_base: 0.0015
121
+ lr_base_scale: ''
122
+ lr_base_size: 1024
123
+ lr_cycle_decay: 0.5
124
+ lr_cycle_limit: 1
125
+ lr_cycle_mul: 1.0
126
+ lr_k_decay: 1.0
127
+ lr_noise: null
128
+ lr_noise_pct: 0.67
129
+ lr_noise_std: 1.0
130
+ mean: null
131
+ min_lr: 5.0e-07
132
+ mixup: 0.8
133
+ mixup_mode: batch
134
+ mixup_off_epoch: 0
135
+ mixup_off_step: 0
136
+ mixup_prob: 1.0
137
+ mixup_switch_prob: 0.5
138
+ model: vit_wee_patch16_reg1_gap_256
139
+ model_dtype: null
140
+ model_ema: true
141
+ model_ema_decay: 0.9998
142
+ model_ema_force_cpu: false
143
+ model_ema_warmup: true
144
+ model_kwargs:
145
+ fix_init: true
146
+ momentum: 0.9
147
+ naflex_loader: false
148
+ naflex_loss_scale: linear
149
+ naflex_max_seq_len: 576
150
+ naflex_patch_size_probs: null
151
+ naflex_patch_sizes: null
152
+ naflex_train_seq_lens:
153
+ - 128
154
+ - 256
155
+ - 576
156
+ - 784
157
+ - 1024
158
+ no_aug: false
159
+ no_ddp_bb: false
160
+ no_prefetcher: false
161
+ no_resume_opt: false
162
+ num_classes: 1000
163
+ num_evals: 100
164
+ num_logs: 1000
165
+ num_saves: 20
166
+ num_steps: 125000
167
+ opt: nadamw
168
+ opt_betas: null
169
+ opt_eps: 1.0e-08
170
+ opt_kwargs: {}
171
+ patience_epochs: 10
172
+ patience_steps: 10
173
+ pin_mem: true
174
+ pretrained: false
175
+ pretrained_path: null
176
+ ratio:
177
+ - 0.75
178
+ - 1.3333333333333333
179
+ recount: 1
180
+ recovery_interval: 5000
181
+ remode: pixel
182
+ reprob: 0.2
183
+ resplit: false
184
+ save_images: false
185
+ scale:
186
+ - 0.08
187
+ - 1.0
188
+ sched: cosine
189
+ sched_on_updates: true
190
+ seed: 42
191
+ smoothing: 0.1
192
+ split_bn: false
193
+ start_epoch: null
194
+ start_step: null
195
+ std: null
196
+ sync_bn: false
197
+ synchronize_step: false
198
+ target_key: label
199
+ task_loss_weight: null
200
+ torchcompile: inductor
201
+ torchcompile_mode: null
202
+ torchscript: false
203
+ train_crop_mode: null
204
+ train_interpolation: random
205
+ train_num_samples: null
206
+ train_split: train
207
+ training_started_step: 2
208
+ tta: 0
209
+ use_multi_epochs_loader: false
210
+ val_interval: 1
211
+ val_num_samples: null
212
+ val_split: validation
213
+ validation_batch_size: null
214
+ vflip: 0.0
215
+ wandb_project: labelmix
216
+ wandb_tags: []
217
+ warmup_epochs: 20
218
+ warmup_lr: 5.0e-07
219
+ warmup_prefix: true
220
+ warmup_steps: 12500
221
+ weight_decay: 0.06
222
+ worker_seeding: all
223
+ workers: 8
labelmix_baseline_vit_wee_patch16_reg1_gap_256_imagenet-1k_seed42/model_best.pth.tar ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c290fb3ddcc3a0ec7068d7b5d2448e632f5c0507b1401d72ead6a7c7fce66e95
3
+ size 215091547
labelmix_baseline_vit_wee_patch16_reg1_gap_256_imagenet-1k_seed42/summary.csv ADDED
The diff for this file is too large to render. See raw diff