| .. _config: | |
| Training Models on Task Datasets (Commands and Configurations) | |
| ################################################################# | |
| LAVIS provides scripts to pre-train and finetune supported models on standard language-vision tasks, stored at ``lavis/run_scripts/``. | |
| To replicate the experiments, just run these bash scripts. For example, to train BLIP model on the image-text retrieval task with MSCOCO dataset, we can run | |
| .. code-block:: | |
| bash run_scripts/blip/train/train_retrieval_coco.sh | |
| Inside the scripts, we can see | |
| .. code-block:: bash | |
| python -m torch.distributed.run --nproc_per_node=8 train.py --cfg-path lavis/projects/blip/train/retrieval_coco_ft.yaml | |
| where we start a pytorch distributed training on 8 GPUs (you may change according to your own hardware setup). The ``--cfg-path`` specifys a `runtime configuration file`, specifying | |
| the task, model, dataset and training recipes. | |
| Available options and their descriptions are as below. | |
| .. LAVIS executes training and evaluation based on arguments specified in the configuration files. The default model and dataset configurations are defined in ``lavis/configs``. The task-specific configurations are defined in ``lavis/projects``. Task-specific configurations have higher priority over the default configurations. | |
| .. The following tables provide explanations for the arguments in the configuration files. | |
| .. list-table:: | |
| :widths: 30 40 | |
| :header-rows: 1 | |
| * - Model Configurations | |
| - Functionalities | |
| * - arch | |
| - | name of the model from the model zoo | |
| | default: task-dependent | |
| * - model_type | |
| - | the type of the model (e.g., base) | |
| | default: task-dependent | |
| * - load_pretrained | |
| - | load pretrained weights | |
| | default: True (for finetuning task) | False (for pretraining task) | |
| * - load_finetuned | |
| - | load task-specific finetuned weights | |
| | default: False (for finetuning task) | True (for evaluation) | |
| * - pretrained | |
| - | URL or local path which stores the pretrained model, defined in the default model configuration file | |
| | default: task-dependent | |
| * - finetuned | |
| - | URL or local path which stores the finetuned model, defined in the default model configuration file | |
| | default: task-dependent | |
| .. list-table:: | |
| :widths: 30 50 | |
| :header-rows: 1 | |
| * - Dataset Configurations | |
| - Functionalities | |
| * - vis_processor | |
| - | pre-processing of visual input | |
| | default: task-dependent | |
| * - text_processor | |
| - | pre-processing of text input | |
| | default: task-dependent | |
| * - build_info | |
| - | dataset information including the storage location, defined in the default dataset configuration file | |
| | default: task-dependent | |
| .. list-table:: | |
| :widths: 30 50 | |
| :header-rows: 1 | |
| * - Runtime Configurations | |
| - Functionalities | |
| * - task | |
| - | name of the task | |
| | default: task-dependent | |
| * - lr_sched | |
| - | learning rate schedular | |
| | default: linear_warmup_cosine_lr | |
| * - init_lr | |
| - | initial learning rate (after warmup) | |
| | default: task-dependent | |
| * - min_lr | |
| - | final learning rate after decay | |
| | default: task-dependent | |
| * - warmup_lr | |
| - | starting learning rate for warmup | |
| | default: init_lr (no warmup) | |
| * - lr_decay_rate | |
| - | learning rate decay per epoch for step_lr_shedule | |
| | default: 0.9 | |
| * - warmup_steps | |
| - | number of steps for learning rate warmup | |
| | default: 0 | |
| * - max_epoch | |
| - | total number of training epochs | |
| | default: task-dependent | |
| * - weight_decay | |
| - | weight decay coefficient for the optimizer | |
| | default: 0.05 | |
| * - batch_size_train | |
| - | batch size during training | |
| | default: task-dependent | |
| * - batch_size_eval | |
| - | batch size during evaluation | |
| | default: task-dependent | |
| * - seed | |
| - | pseudo random number generator seed | |
| | default: 42 | |
| * - output_dir | |
| - | directory to store logs, results and checkpoints | |
| | default: task-dependent | |
| * - resume_ckpt_path | |
| - | path of the checkpoint to resume training from | |
| | default: None | |
| * - evaluate | |
| - | only perform evaluation without training | |
| | default: False | |
| * - train_splits | |
| - | dataset splits used for training | |
| | default: ["train"] | |
| * - valid_splits | |
| - | dataset splits used for validation | |
| | default: ["val"] | |
| * - test | |
| - | dataset splits used for test | |
| | default: ["test"] | |
| * - device | |
| - | use cpu or gpu (cuda) | |
| | default: cuda | |
| * - world_size | |
| - | number of processes participating in the job | |
| | default: 1 | |
| * - dist_url | |
| - | URL specifying how to initialize the process group | |
| | default: "env://" | |
| * - distributed | |
| - | use distributed training | |
| | default: True | |
| * - amp | |
| - | use automatic mixed precision training | |
| | default: False | |
| .. list-table:: | |
| :widths: 40 50 | |
| :header-rows: 1 | |
| * - Text Generation Configurations | |
| - Functionalities | |
| * - max_len | |
| - | maximum number of text tokens to generate | |
| | default: 20 (for image captioning) | |
| * - min_len | |
| - | minimum number of text tokens to generate | |
| | default: 5 (for image captioning) | |
| * - num_beams | |
| - | number of beams to perform beam search | |
| | default: 3 | |
| .. list-table:: | |
| :widths: 40 50 | |
| :header-rows: 1 | |
| * - Multimodal Retrieval Configurations | |
| - Functionalities | |
| * - negative_all_rank | |
| - | collect negatives from all processes for the image-text matching loss | |
| | default: True (for coco) | |
| * - k_test | |
| - | number of retrieval candidates ranked from contrastive similarity | |
| | default: 256 (for coco) | |