| # SPG: Sequential Policy Gradient for Adaptive Hyperparameter Optimization | |
| ## Model Zoo: Adaptive Hyperparameter Optimization (HPO) via SPG Algorithm | |
| `Table 1: Performance of pre-trained vs. SPG-retrained models on ImageNet-1K` | |
| | Model | SPG | # Params | Acc@1 (%) | Acc@5 (%) | Weights | Command to reproduce | | |
| |-------|------|----------|-----------|-----------|---------|----------------------| | |
| | MobileNet-V2 | β | 3.5 M | 71.878 | 90.286 | <a href='https://download.pytorch.org/models/mobilenet_v2-b0353104.pth'><img src='https://img.shields.io/badge/PyTorch-IMAGENET1K_V1-FFA500?style=flat&logo=pytorch&logoColor=orange&labelColor=00000000'></a> | <a href='https://github.com/pytorch/vision/tree/main/references/classification#mobilenetv2'>Recipe</a> | | |
| | MobileNet-V2 | β | 3.5 M | 72.104 | 90.316 | <a href='https://huggingface.co/UniversalAlgorithmic/SPG/resolve/main/mobilenet_v2/model_32.pth'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Huggingface-SPG/mobilenet_v2-yellow'></a> | [examples/image-classification/run.sh](#-Retrain-model-on-ImageNet-1K) | | |
| | ResNet-50 | β | 25.6 M | 76.130 | 92.862 | <a href='https://download.pytorch.org/models/resnet50-0676ba61.pth'><img src='https://img.shields.io/badge/PyTorch-IMAGENET1K_V1-FFA500?style=flat&logo=pytorch&logoColor=orange&labelColor=00000000'></a> | <a href='https://github.com/pytorch/vision/tree/main/references/classification#resnet'>Recipe</a> | | |
| | ResNet-50 | β | 25.6 M | 77.234 | 93.322 | <a href='https://huggingface.co/UniversalAlgorithmic/SPG/resolve/main/resnet50/model_35.pth'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Huggingface-SPG/resnet50-yellow'></a> | | |
| | EfficientNet-V2-M | β | 54.1 M | 85.112 | 97.156 | <a href='https://download.pytorch.org/models/efficientnet_v2_m-dc08266a.pth'><img src='https://img.shields.io/badge/PyTorch-IMAGENET1K_V1-FFA500?style=flat&logo=pytorch&logoColor=orange&labelColor=00000000'></a> | <a href='https://github.com/pytorch/vision/tree/main/references/classification#efficientnet-v2'>Recipe</a> | | |
| | EfficientNet-V2-M | β | 54.1 M | 85.218 | 97.208 | <a href='https://huggingface.co/UniversalAlgorithmic/SPG/resolve/main/efficientnet_v2_m/model_7.pth'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Huggingface-SPG/efficientnet_v2_m-yellow'></a> | | |
| | ViT-B16 | β | 86.6 M | 81.072 | 95.318 | <a href='https://download.pytorch.org/models/vit_b_16-c867db91.pth'><img src='https://img.shields.io/badge/PyTorch-IMAGENET1K_V1-FFA500?style=flat&logo=pytorch&logoColor=orange&labelColor=00000000'></a> | <a href='https://github.com/pytorch/vision/tree/main/references/classification#vit_b_16'>Recipe</a> | | |
| | ViT-B16 | β | 86.6 M | 81.092 | 95.304 | <a href='https://huggingface.co/UniversalAlgorithmic/SPG/resolve/main/vit_b_16/model_4.pth'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Huggingface-SPG/vit_b_16-yellow'></a> | | |
| `Table 2: Performance of pre-trained vs. SPG-retrained models. All models are evaluated a subset of COCO val2017, on the 21 categories (including "background") that are present in the Pascal VOC dataset.` | |
| β οΈ`All model reported on TorchVision (with weight COCO_WITH_VOC_LABELS_V1) were benchmarked using only 20 categories. Researchers should first download the pre-trained model from TorchVision and conduct re-evaluation under the 21-category framework.` | |
| | Model | SPG | # Params | mIoU (%) | pixelwise Acc (%) | Weights | Command to reproduce | | |
| |---------------------|-----|----------|------------|---------------------|---------|----------------------| | |
| | FCN-ResNet50 | β | 35.3 M | 58.9 | 90.9 | <a href='https://download.pytorch.org/models/fcn_resnet50_coco-1167a1af.pth'><img src='https://img.shields.io/badge/PyTorch-COCO_WITH_VOC_LABELS_V1-FFA500?style=flat&logo=pytorch&logoColor=orange&labelColor=00000000'></a> | <a href='https://github.com/pytorch/vision/tree/main/references/segmentation#fcn_resnet50'>Recipe</a> | | |
| | FCN-ResNet50 | β | 35.3 M | 59.4 | 90.9 | <a href='https://huggingface.co/UniversalAlgorithmic/SPG/resolve/main/fcn_resnet50/model_4.pth'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Huggingface-SPG/fcn_resnet50-yellow'></a> | | |
| | FCN-ResNet101 | β | 54.3 M | 62.2 | 91.1 | <a href='https://download.pytorch.org/models/fcn_resnet101_coco-7ecb50ca.pth'><img src='https://img.shields.io/badge/PyTorch-COCO_WITH_VOC_LABELS_V1-FFA500?style=flat&logo=pytorch&logoColor=orange&labelColor=00000000'></a> | <a href='https://github.com/pytorch/vision/tree/main/references/segmentation#deeplabv3_resnet101'>Recipe</a> | | |
| | FCN-ResNet101 | β | 54.3 M | 62.4 | 91.1 | <a href='https://huggingface.co/UniversalAlgorithmic/SPG/resolve/main/fcn_resnet101/model_4.pth'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Huggingface-SPG/fcn_resnet101-yellow'></a> | | |
| | DeepLabV3-ResNet50 | β | 42.0 M | 63.8 | 91.5 | <a href='https://download.pytorch.org/models/deeplabv3_resnet50_coco-cd0a2569.pth'><img src='https://img.shields.io/badge/PyTorch-COCO_WITH_VOC_LABELS_V1-FFA500?style=flat&logo=pytorch&logoColor=orange&labelColor=00000000'></a> | <a href='https://github.com/pytorch/vision/tree/main/references/segmentation#deeplabv3_resnet50'>Recipe</a> | | |
| | DeepLabV3-ResNet50 | β | 42.0 M | 64.2 | 91.6 | <a href='https://huggingface.co/UniversalAlgorithmic/SPG/resolve/main/deeplabv3_resnet50/model_4.pth'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Huggingface-SPG/deeplabv3_resnet50-yellow'></a> | | |
| | DeepLabV3-ResNet101 | β | 61.0 M | 65.3 | 91.7 | <a href='https://download.pytorch.org/models/deeplabv3_resnet101_coco-586e9e4e.pth'><img src='https://img.shields.io/badge/PyTorch-COCO_WITH_VOC_LABELS_V1-FFA500?style=flat&logo=pytorch&logoColor=orange&labelColor=00000000'></a> | <a href='https://github.com/pytorch/vision/tree/main/references/segmentation#deeplabv3_resnet101'>Recipe</a> | | |
| | DeepLabV3-ResNet101 | β | 61.0 M | 65.7 | 91.8 | <a href='https://huggingface.co/UniversalAlgorithmic/SPG/resolve/main/deeplabv3_resnet101/model_4.pth'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Huggingface-SPG/deeplabv3_resnet101-yellow'></a> | | |
| `Table X: Performance comparison of fine-tuned vs. SPG-retrained models across NLP and speech benchmarks.` | |
| - GLUE (Text classification: BERT on CoLA, SST-2, MRPC, QQP, QNLI, and RTE task) | |
| - SQuAD (Question answering: BERT) | |
| - SUPERB (Speech classification: Wav2Vec2 for Audio Classification (AC)) | |
| | Task | SPG | Metric Type | Performance (%) | Weights | Command to reproduce | | |
| |-------|------|-------------------|-----------------|---------|----------------------| | |
| | CoLA | β | Matthews coor | 56.53 | <a href='https://github.com/huggingface/transformers/tree/main/examples/pytorch/text-classification'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Huggingface-text_classification-yellow'></a> | <a href='https://github.com/huggingface/transformers/tree/main/examples/pytorch/text-classification#glue-tasks'>Recipe</a> | | |
| | CoLA | β | Matthews coor | 62.13 | <a href='https://huggingface.co/UniversalAlgorithmic/SPG/tree/main/cola'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Huggingface-SPG/CoLA-yellow'></a> | | |
| | SST-2 | β | Accuracy | 92.32 | <a href='https://github.com/huggingface/transformers/tree/main/examples/pytorch/text-classification'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Huggingface-text_classification-yellow'></a> | <a href='https://github.com/huggingface/transformers/tree/main/examples/pytorch/text-classification#glue-tasks'>Recipe</a> | | |
| | SST-2 | β | Accuracy | 92.54 | <a href='https://huggingface.co/UniversalAlgorithmic/SPG/tree/main/sst2'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Huggingface-SPG/SST2-yellow'></a> | | |
| | MRPC | β | F1/Accuracy | 88.85/84.09 | <a href='https://github.com/huggingface/transformers/tree/main/examples/pytorch/text-classification'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Huggingface-text_classification-yellow'></a> | <a href='https://github.com/huggingface/transformers/tree/main/examples/pytorch/text-classification#glue-tasks'>Recipe</a> | | |
| | MRPC | β | F1/Accuracy | 91.10/87.25 | <a href='https://huggingface.co/UniversalAlgorithmic/SPG/tree/main/mrpc'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Huggingface-SPG/MRPC-yellow'></a> | | |
| | QQP | β | F1/Accuracy | 87.49/90.71 | <a href='https://github.com/huggingface/transformers/tree/main/examples/pytorch/text-classification'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Huggingface-text_classification-yellow'></a> | <a href='https://github.com/huggingface/transformers/tree/main/examples/pytorch/text-classification#glue-tasks'>Recipe</a> | | |
| | QQP | β | F1/Accuracy | 89.72/90.88 | <a href='https://huggingface.co/UniversalAlgorithmic/SPG/tree/main/qqp'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Huggingface-SPG/QQP-yellow'></a> | | |
| | QNLI | β | Accuracy | 90.66 | <a href='https://github.com/huggingface/transformers/tree/main/examples/pytorch/text-classification'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Huggingface-text_classification-yellow'></a> | <a href='https://github.com/huggingface/transformers/tree/main/examples/pytorch/text-classification#glue-tasks'>Recipe</a> | | |
| | QNLI | β | Accuracy | 91.10 | <a href='https://huggingface.co/UniversalAlgorithmic/SPG/tree/main/qnli'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Huggingface-SPG/QNLI-yellow'></a> | | |
| | RTE | β | Accuracy | 65.70 | <a href='https://github.com/huggingface/transformers/tree/main/examples/pytorch/text-classification'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Huggingface-text_classification-yellow'></a> | <a href='https://github.com/huggingface/transformers/tree/main/examples/pytorch/text-classification#glue-tasks'>Recipe</a> | | |
| | RTE | β | Accuracy | 72.56 | <a href='https://huggingface.co/UniversalAlgorithmic/SPG/tree/main/rte'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Huggingface-SPG/RTE-yellow'></a> | | |
| | Q/A* | β | F1/Extra match | 88.52/81.22 | <a href='https://github.com/huggingface/transformers/tree/main/examples/pytorch/question-answering'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Huggingface-question_answering-yellow'></a> | <a href='https://github.com/huggingface/transformers/tree/main/examples/pytorch/question-answering#fine-tuning-bert-on-squad10'>Recipe</a> | | |
| | Q/A* | β | F1/Extra match | 88.67/81.51 | <a href='https://huggingface.co/UniversalAlgorithmic/SPG/tree/main/qa'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Huggingface-SPG/QA-yellow'></a> | | |
| | ACβ | β | Accuracy | 98.26 | <a href='https://github.com/huggingface/transformers/tree/main/examples/pytorch/audio-classification'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Huggingface-audio_classification-yellow'></a> | <a href='https://github.com/huggingface/transformers/tree/main/examples/pytorch/audio-classification#single-gpu'>Recipe</a> | | |
| | ACβ | β | Accuracy | 98.31 | <a href='https://huggingface.co/UniversalAlgorithmic/SPG/tree/main/ac'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Huggingface-SPG/AC-yellow'></a> | | |
| ## Model Zoo: Neural Architecture Search (NAS) via SPG Algorithm | |
| ## Requirements | |
| 1. Install `torch>=2.0.0+cu118`. | |
| 2. To install other pip packages: | |
| ```setup | |
| pip install -r requirements.txt | |
| ``` | |
| 3. Prepare the [ImageNet](http://image-net.org/) dataset manually and place it in `/path/to/imagenet`. For image classification examples, pass the argument `--data-path=/path/to/imagenet` to the training script. The extracted dataset directory should follow this structure: | |
| ```setup | |
| /path/to/imagenet/: | |
| train/: | |
| n01440764: | |
| n01440764_18.JPEG ... | |
| n01443537: | |
| n01443537_2.JPEG ... | |
| val/: | |
| n01440764: | |
| ILSVRC2012_val_00000293.JPEG ... | |
| n01443537: | |
| ILSVRC2012_val_00000236.JPEG ... | |
| ``` | |
| 4. Prepare the [MS-COCO 2017](https://cocodataset.org/#home) dataset manually and place it in `/path/to/coco`. For image classification examples, pass the argument `--data-path=/path/to/coco` to the training script. The extracted dataset directory should follow this structure: | |
| ```setup | |
| /path/to/coco/: | |
| annotations: | |
| many_json_files.json ... | |
| train2017: | |
| 000000000009.jpg ... | |
| val2017: | |
| 000000000139.jpg ... | |
| ``` | |
| 5. For [π£οΈ Keyword Spotting subset](https://huggingface.co/datasets/s3prl/superb#ks), [Common Language](https://huggingface.co/datasets/speechbrain/common_language), [SQuAD](https://huggingface.co/datasets/rajpurkar/squad), [Common Voice](https://huggingface.co/datasets/legacy-datasets/common_voice), [GLUE](https://gluebenchmark.com/) and [WMT](https://huggingface.co/datasets/wmt/wmt17) datasets, manual downloading is not required β they will be automatically loaded via the Hugging Face Datasets library when running our `audio-classification`, `question-answering`, `speech-recognition`, `text-classification`, or `translation` examples. | |
| ## Training | |
| <a id="#-Retrain-model-on-ImageNet-1K"></a> | |
| ### Retrain model on ImageNet-1K | |
| We use training recipes similar to those in [PyTorch Vision's classification reference](https://github.com/pytorch/vision/blob/main/references/classification/README.md) to retrain MobileNet-V2, ResNet, EfficientNet-V2, and ViT with our SPG on ImageNet-1K. The following command can be used: | |
| ```bash | |
| cd ./examples/image-classification | |
| # MobileNet-V2 | |
| torchrun --nproc_per_node=4 train.py\ | |
| --data-path /path/to/imagenet/\ | |
| --model mobilenet_v2 --output-dir mobilenet_v2 --weights MobileNet_V2_Weights.IMAGENET1K_V1\ | |
| --batch-size 192 --epochs 40 --lr 0.0004 --lr-step-size 10 --lr-gamma 0.5 --wd 0.00004 --apply-trp --trp-depths 1 --trp-p 0.15 --trp-lambdas 0.4 0.2 0.1 | |
| # ResNet-50 | |
| torchrun --nproc_per_node=4 train.py\ | |
| --data-path /path/to/imagenet/\ | |
| --model resnet50 --output-dir resnet50 --weights ResNet50_Weights.IMAGENET1K_V1\ | |
| --batch-size 64 --epochs 40 --lr 0.0004 --lr-step-size 10 --lr-gamma 0.5 --print-freq 100\ | |
| --apply-trp --trp-depths 1 --trp-p 0.2 --trp-lambdas 0.4 0.2 0.1 | |
| # EfficientNet-V2 M | |
| torchrun --nproc_per_node=4 train.py \ | |
| --data-path /path/to/imagenet/\ | |
| --model efficientnet_v2_m --output-dir efficientnet_v2_m --weights EfficientNet_V2_M_Weights.IMAGENET1K_V1\ | |
| --epochs 10 --batch-size 64 --lr 5e-9 --lr-scheduler cosineannealinglr --weight-decay 0.00002 \ | |
| --lr-warmup-method constant --lr-warmup-epochs 8 --lr-warmup-decay 0. \ | |
| --auto-augment ta_wide --random-erase 0.1 --label-smoothing 0.1 --mixup-alpha 0.2 --cutmix-alpha 1.0 --norm-weight-decay 0.0 \ | |
| --train-crop-size 384 --val-crop-size 480 --val-resize-size 480 --ra-sampler --ra-reps 4 --print-freq 100\ | |
| --apply-trp --trp-depths 1 --trp-p 0.2 --trp-lambdas 0.4 0.2 0.1 | |
| # ViT-B-16 | |
| torchrun --nproc_per_node=4 train.py\ | |
| --data-path /path/to/imagenet/\ | |
| --model vit_b_16 --output-dir vit_b_16 --weights ViT_B_16_Weights.IMAGENET1K_V1\ | |
| --epochs 5 --batch-size 196 --opt adamw --lr 5e-9 --lr-scheduler cosineannealinglr --wd 0.3\ | |
| --lr-warmup-method constant --lr-warmup-epochs 3 --lr-warmup-decay 0. \ | |
| --amp --label-smoothing 0.11 --mixup-alpha 0.2 --auto-augment ra --clip-grad-norm 1 --cutmix-alpha 1.0\ | |
| --apply-trp --trp-depths 1 --trp-p 0.1 --trp-lambdas 0.4 0.2 0.1 --print-freq 100 | |
| ``` | |
| ### Retrain model on MS-COCO 2017 | |
| We use training recipes similar to those in [PyTorch Vision's segmentation reference](https://github.com/pytorch/vision/blob/main/references/segmentation/README.md) to retrain FCN and DeepLab-V3 with our SPG on COCO dataset. The following command can be used: | |
| ```bash | |
| cd ./examples/semantic-segmentation | |
| # FCN-ResNet50 | |
| torchrun --nproc_per_node=4 train.py\ | |
| --workers 4 --dataset coco --data-path /path/to/coco/\ | |
| --model fcn_resnet50 --aux-loss --output-dir fcn_resnet50 --weights FCN_ResNet50_Weights.COCO_WITH_VOC_LABELS_V1\ | |
| --epochs 5 --batch-size 16 --lr 0.0002 --aux-loss --print-freq 100\ | |
| --lr-warmup-method constant --lr-warmup-epochs 3 --lr-warmup-decay 0. \ | |
| --apply-trp --trp-depths 1 --trp-p 0.1 --trp-lambdas 0.4 0.2 0.1 | |
| # FCN-ResNet101 | |
| torchrun --nproc_per_node=4 train.py\ | |
| --workers 4 --dataset coco --data-path /path/to/coco/\ | |
| --model fcn_resnet101 --aux-loss --output-dir fcn_resnet101 --weights FCN_ResNet101_Weights.COCO_WITH_VOC_LABELS_V1\ | |
| --epochs 5 --batch-size 12 --lr 0.0002 --aux-loss --print-freq 100\ | |
| --lr-warmup-method constant --lr-warmup-epochs 3 --lr-warmup-decay 0. \ | |
| --apply-trp --trp-depths 1 --trp-p 0.1 --trp-lambdas 0.4 0.2 0.1 | |
| # DeepLabV3-ResNet50 | |
| torchrun --nproc_per_node=4 train.py\ | |
| --workers 4 --dataset coco --data-path /path/to/coco/\ | |
| --model deeplabv3_resnet50 --aux-loss --output-dir deeplabv3_resnet50 --weights DeepLabV3_ResNet50_Weights.COCO_WITH_VOC_LABELS_V1\ | |
| --epochs 5 --batch-size 16 --lr 0.0002 --aux-loss --print-freq 100\ | |
| --lr-warmup-method constant --lr-warmup-epochs 3 --lr-warmup-decay 0. \ | |
| --apply-trp --trp-depths 1 --trp-p 0.1 --trp-lambdas 0.4 0.2 0.1 | |
| # DeepLabV3-ResNet101 | |
| torchrun --nproc_per_node=4 train.py\ | |
| --workers 4 --dataset coco --data-path /path/to/coco/\ | |
| --model deeplabv3_resnet101 --aux-loss --output-dir deeplabv3_resnet101 --weights DeepLabV3_ResNet101_Weights.COCO_WITH_VOC_LABELS_V1\ | |
| --epochs 5 --batch-size 12 --lr 0.0002 --aux-loss --print-freq 100\ | |
| --lr-warmup-method constant --lr-warmup-epochs 3 --lr-warmup-decay 0. \ | |
| --apply-trp --trp-depths 1 --trp-p 0.1 --trp-lambdas 0.4 0.2 0.1 | |
| ``` | |
| </details> | |
| ### Transfer learning on GLUE | |
| We use recipes similar to those in [HuggingFace Transformers' Examples](https://github.com/huggingface/transformers/blob/main/examples/pytorch/README.md) to retrain BERT with our SPG on GLUE benchmark. The following command can be used: | |
| ```bash | |
| cd ./examples/text-classification && bash run.sh | |
| ``` | |
| ### Transfer learning on SQuAD | |
| We use recipes similar to those in [HuggingFace Transformers' Examples](https://github.com/huggingface/transformers/blob/main/examples/pytorch/README.md) to retrain Wav2Vec with our SPG on SQuAD dataset. The following command can be used: | |
| ```bash | |
| cd ./examples/audio-classification | |
| CUDA_VISIBLE_DEVICES=0 python run_audio_classification.py \ | |
| --model_name_or_path facebook/wav2vec2-base \ | |
| --dataset_name superb \ | |
| --dataset_config_name ks \ | |
| --trust_remote_code \ | |
| --output_dir wav2vec2-base-ft-keyword-spotting \ | |
| --overwrite_output_dir \ | |
| --remove_unused_columns False \ | |
| --do_train \ | |
| --do_eval \ | |
| --fp16 \ | |
| --learning_rate 3e-5 \ | |
| --max_length_seconds 1 \ | |
| --attention_mask False \ | |
| --warmup_ratio 0.1 \ | |
| --num_train_epochs 8 \ | |
| --per_device_train_batch_size 64 \ | |
| --gradient_accumulation_steps 4 \ | |
| --per_device_eval_batch_size 32 \ | |
| --dataloader_num_workers 4 \ | |
| --logging_strategy steps \ | |
| --logging_steps 10 \ | |
| --eval_strategy epoch \ | |
| --save_strategy epoch \ | |
| --load_best_model_at_end True \ | |
| --metric_for_best_model accuracy \ | |
| --save_total_limit 3 \ | |
| --seed 0 \ | |
| --push_to_hub \ | |
| --apply-trp --trp-depths 1 --trp-p 0.1 --trp-lambdas 0.4 0.2 0.1 | |
| ``` | |
| ### Transfer learning on SUPERB | |
| We use recipes similar to those in [HuggingFace Transformers' Examples](https://github.com/huggingface/transformers/blob/main/examples/pytorch/README.md) to retrain BERT with our SPG on SUPERB benchmark. The following command can be used: | |
| ```bash | |
| cd ./examples/question-answering | |
| CUDA_VISIBLE_DEVICES=0 python run_qa.py \ | |
| --model_name_or_path google-bert/bert-base-uncased \ | |
| --dataset_name squad \ | |
| --do_train \ | |
| --do_eval \ | |
| --per_device_train_batch_size 12 \ | |
| --learning_rate 3e-5 \ | |
| --num_train_epochs 2 \ | |
| --max_seq_length 384 \ | |
| --doc_stride 128 \ | |
| --output_dir ./baseline \ | |
| --overwrite_output_dir \ | |
| --apply-trp --trp-depths 1 --trp-p 0.1 --trp-lambdas 0.4 0.2 0.1 | |
| ``` | |
| ### Neural Architecture Search for ResNet on ImageNet-1K | |
| We conduct Neural Architecture Search (NAS) for the ResNet architecture on the ImageNet dataset. The following command can be used: | |
| ```bash | |
| cd ./examples/neural-architecture-search | |
| # During Neural Architecture Search (NAS), we explore ResNet-18, ResNet-27, ResNet-36, and ResNet-45. After retraining with SPG algorithm, we retain only ResNet-18 and discard the others. | |
| torchrun --nproc_per_node=4 train.py\ | |
| --data-path /home/cs/Documents/datasets/imagenet\ | |
| --model resnet18 --output-dir resnet18 --weights ResNet18_Weights.IMAGENET1K_V1\ | |
| --batch-size 128 --epochs 10 --lr 0.0004 --lr-step-size 2 --lr-gamma 0.5\ | |
| --lr-warmup-method constant --lr-warmup-epochs 1 --lr-warmup-decay 0.\ | |
| --apply-trp --trp-depths 3 3 3 --trp-planes 256 --trp-lambdas 0.4 0.2 0.1 --print-freq 100 | |
| # During Neural Architecture Search (NAS), we explore ResNet-34, ResNet-40, ResNet-46, and ResNet-52. After retraining with SPG algorithm, we retain only ResNet-34 and discard the others. | |
| torchrun --nproc_per_node=4 train.py\ | |
| --data-path /home/cs/Documents/datasets/imagenet\ | |
| --model resnet34 --output-dir resnet34 --weights ResNet34_Weights.IMAGENET1K_V1\ | |
| --batch-size 96 --epochs 10 --lr 0.0004 --lr-step-size 2 --lr-gamma 0.5\ | |
| --lr-warmup-method constant --lr-warmup-epochs 1 --lr-warmup-decay 0.\ | |
| --apply-trp --trp-depths 2 2 2 --trp-planes 256 --trp-lambdas 0.4 0.2 0.1 --print-freq 100 | |
| # During Neural Architecture Search (NAS), we explore ResNet-34, ResNet-50, ResNet-53, and ResNet-56. After retraining with SPG algorithm, we retain only ResNet-50 and discard the others. | |
| torchrun --nproc_per_node=4 train.py\ | |
| --data-path /home/cs/Documents/datasets/imagenet\ | |
| --model resnet50 --output-dir resnet50 --weights ResNet50_Weights.IMAGENET1K_V1\ | |
| --batch-size 64 --epochs 10 --lr 0.0004 --lr-step-size 2 --lr-gamma 0.5\ | |
| --lr-warmup-method constant --lr-warmup-epochs 1 --lr-warmup-decay 0.\ | |
| --apply-trp --trp-depths 1 1 1 --trp-planes 1024 --trp-lambdas 0.4 0.2 0.1 --print-freq 100 | |
| ``` | |
| ## Evaluation | |
| To evaluate our models on ImageNet, run: | |
| ```bash | |
| cd image-classification | |
| # Required: Download our MobileNet-V2 weights to /path/to/image-classification/mobilenet_v2 | |
| torchrun --nproc_per_node=4 train.py\ | |
| --data-path /path/to/imagenet/\ | |
| --model mobilenet_v2 --resume mobilenet_v2/model_32.pth --test-only | |
| # Required: Download our ResNet-50 weights to /path/to/image-classification/resnet50 | |
| torchrun --nproc_per_node=4 train.py\ | |
| --data-path /path/to/imagenet/\ | |
| --model resnet50 --resume resnet50/model_35.pth --test-only | |
| # Required: Download our EfficientNet-V2 M weights to /path/to/image-classification/efficientnet_v2_m | |
| torchrun --nproc_per_node=4 train.py\ | |
| --data-path /path/to/imagenet/\ | |
| --model efficientnet_v2_m --resume efficientnet_v2_m/model_7.pth --test-only\ | |
| --val-crop-size 480 --val-resize-size 480 | |
| # Required: Download our ViT-B-16 weights to /path/to/image-classification/vit_b_16 | |
| torchrun --nproc_per_node=4 train.py\ | |
| --data-path /path/to/imagenet/\ | |
| --model vit_b_16 --resume vit_b_16/model_4.pth --test-only | |
| ``` | |
| To evaluate our models on COCO, run: | |
| ```bash | |
| cd ./examples/semantic-segmentation | |
| # eval baselines | |
| torchrun --nproc_per_node=4 train.py\ | |
| --workers 4 --dataset coco --data-path /path/to/coco/\ | |
| --model fcn_resnet50 --aux-loss --weights FCN_ResNet50_Weights.COCO_WITH_VOC_LABELS_V1\ | |
| --test-only | |
| torchrun --nproc_per_node=4 train.py\ | |
| --workers 4 --dataset coco --data-path /path/to/coco/\ | |
| --model fcn_resnet101 --aux-loss --weights FCN_ResNet101_Weights.COCO_WITH_VOC_LABELS_V1\ | |
| --test-only | |
| torchrun --nproc_per_node=4 train.py\ | |
| --workers 4 --dataset coco --data-path /path/to/coco/\ | |
| --model deeplabv3_resnet50 --aux-loss --weights DeepLabV3_ResNet50_Weights.COCO_WITH_VOC_LABELS_V1\ | |
| --test-only | |
| torchrun --nproc_per_node=4 train.py\ | |
| --workers 4 --dataset coco --data-path /path/to/coco/\ | |
| --model deeplabv3_resnet101 --aux-loss --weights DeepLabV3_ResNet101_Weights.COCO_WITH_VOC_LABELS_V1\ | |
| --test-only | |
| # eval our models | |
| # Required: Download our FCN-ResNet50 weights to /path/to/semantic-segmentation/fcn_resnet50 | |
| torchrun --nproc_per_node=4 train.py\ | |
| --workers 4 --dataset coco --data-path /path/to/coco/\ | |
| --model fcn_resnet50 --aux-loss --resume fcn_resnet50/model_4.pth\ | |
| --test-only | |
| # Required: Download our FCN-ResNet101 weights to /path/to/semantic-segmentation/fcn_resnet101 | |
| torchrun --nproc_per_node=4 train.py\ | |
| --workers 4 --dataset coco --data-path /path/to/coco/\ | |
| --model fcn_resnet101 --aux-loss --resume fcn_resnet101/model_4.pth\ | |
| --test-only | |
| # Required: Download our DeepLabV3-ResNet50 weights to /path/to/semantic-segmentation/deeplabv3_resnet50 | |
| torchrun --nproc_per_node=4 train.py\ | |
| --workers 4 --dataset coco --data-path /path/to/coco/\ | |
| --model deeplabv3_resnet50 --aux-loss --resume deeplabv3_resnet50/model_4.pth\ | |
| --test-only | |
| # Required: Download our DeepLabV3-ResNet101 weights to /path/to/semantic-segmentation/deeplabv3_resnet101 | |
| torchrun --nproc_per_node=4 train.py\ | |
| --workers 4 --dataset coco --data-path /path/to/coco/\ | |
| --model deeplabv3_resnet101 --aux-loss --weights DeepLabV3_ResNet101_Weights.COCO_WITH_VOC_LABELS_V1\ | |
| --test-only | |
| ``` | |
| To evaluate our models on GLUE, SquAD, and SUPERB, please re-run the `transfer learning` related commands we previously declared, as these commands are used not only for training but also for evaluation. | |
| For Network Architecture Search, please run the following command to evaluate our SPG-trained ResNet-18 model: | |
| ```bash | |
| cd ./examples/neural-architecture-search | |
| # Required: Download our ResNet-18 weights to /path/to/neural-architecture-search/resnet18 | |
| torchrun --nproc_per_node=4 train.py\ | |
| --data-path /home/cs/Documents/datasets/imagenet\ | |
| --model resnet18 --resume resnet18/model_3.pth --test-only | |
| # Required: Download our ResNet-34 weights to /path/to/neural-architecture-search/resnet34 | |
| torchrun --nproc_per_node=4 train.py\ | |
| --data-path /home/cs/Documents/datasets/imagenet\ | |
| --model resnet34 --resume resnet34/model_8.pth --test-only | |
| # Required: Download our ResNet-50 weights to /path/to/neural-architecture-search/resnet50 | |
| torchrun --nproc_per_node=4 train.py\ | |
| --data-path /home/cs/Documents/datasets/imagenet\ | |
| --model resnet50 --resume resnet50/model_9.pth --test-only | |
| ``` | |
| ## License | |
| This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. | |