lingbot-vla / docs /config /config.md
bazaar-research's picture
Upload folder using huggingface_hub
fb11af9 verified

Config arguments Explanation

Model configuration arguments

Name Type Description Default Value
model.config_path str Path to the model huggingface configuration, like config.json model.model_path
model.model_path str Path to the model parameter file. If empty, random initialization will be performed None
model.tokenizer_path str Path to the tokenizer model.model_path
model.encoders dict Configuration file for multi-modal encoders {}
model.decoders dict Configuration file for multi-modal decoders {}
model.input_encoder str: {"encoder", "decoder"} Use the encoder of the encoder or decoder to encode the input image encoder
model.output_encoder str: {"encoder", "decoder"} Use the encoder of the encoder or decoder to encode the output image decoder
model.encode_target bool Used to encode the training data for the diffusion model False

Data configuration arguments

Name Type Description Default Value
data.train_path str Path of training dataset Required
data.train_size int Total number of tokens in the training set 10,000,000
data.data_type str: {"plaintext", "conversation"} Dataset type. conversation
data.dataloader_type str: {"native"} Use the pytorch dataloader or native
data.datasets_type str: {"mapping", "iterable"} Dataset type. IterativeDataset or MappingDataset, or your custom datsets mapping
data.text_keys str: {"content_split", "messages"} The key corresponding to the text samples in the data dictionary. Generally, it is "content_split" for pretraining and "messages" for SFT. content_split
data.image_keys str The key corresponding to the image samples in the data dictionary. Generally, it is "images". images
data.chat_template str Name of the chat template. default
data.max_seq_len int Maximum training length. 2048
data.num_workers int Number of multi-process loaders for the dataloader. 4
data.drop_last bool Whether to discard the remaining data at the end. True
data.pin_memory bool Whether to pin the data in the CPU memory. True
data.prefetch_factor int Number of samples preprocessed by the dataloader. 2

Training configuration arguments

Name Type Description Default Value
train.output_dir str Path to save the model. Required
train.lr float Maximum learning rate. 5e - 5
train.lr_min float Minimum learning rate. 1e - 7
train.weight_decay float Weight decay coefficient. 0
train.optimizer str: {"adamw", "anyprecision_adamw"} Name of the optimizer. adamw
train.max_grad_norm float Gradient clipping norm. 1.0
train.micro_batch_size int Number of samples processed simultaneously on each GPU. 1
train.global_batch_size int Global batch size, which must be a multiple of the number of GPUs. train.micro_batch_size * n_gpus
train.num_train_epochs int Number of training epochs. 1
train.rmpad bool Whether to use rmpad training based on cu_seqlens. False
train.rmpad_with_pos_ids bool Whether to use rmpad training based on position_ids. False
train.dyn_bsz_margin int Number of pad tokens in the dynamic batch. 0
train.dyn_bsz_runtime str: {"main", "worker"} Running process of the dynamic batch. worker
train.bsz_warmup_ratio float Proportion of batch size warmup in the total number of steps. 0
train.lr_warmup_ratio float Proportion of learning rate warmup in the total number of steps. 0
train.lr_decay_style str: {"constant", "linear", "cosine"} Name of the learning rate scheduler. cosine
train.lr_decay_ratio float Proportion of learning rate decay in the total number of steps 1.0
train.use_doptim bool Whether to use the distributed optimizer during Vescale training(no use for torch fsdp) False
train.enable_mixed_precision bool Whether to enable mixed precision training (higher memory usage but more stable) True
train.enable_gradient_checkpointing bool Whether to enable gradient checkpointing to reduce memory usage. True
train.enable_reentrant bool Whether to enable reentrant in gradient checkpointing. True
train.enable_full_shard bool Whether to use full sharding FSDP (equivalent to ZeRO3). True
train.enable_fsdp_offload bool Whether to enable FSDP CPU offloading (only supported for FSDP1). False
train.enable_activation_offload bool Whether to enable activation value CPU offloading. False
train.activation_gpu_limit float Size of the activation values retained on the GPU (in GB). 0.0
train.enable_manual_eager bool Whether to use manual eager during Vescale training. False
train.init_device: meta str "cpu", "cuda", "meta", init device for model initialization. use "meta" or cpu for large model(>30B) cuda
train.enable_full_determinism bool Whether to enable deterministic mode (for bitwise alignment). False
train.empty_cache_steps int Number of steps between two cache clearings. -1 means not enabled. 500
train.data_parallel_mode str: {"ddp", "fsdp1", "fsdp2"} Data parallel algorithm. ddp
train.tensor_parallel_size int Tensor parallel size (currently only supported for vescale training). 1
train.pipeline_parallel_size int Pipeline parallel size (currently not supported). 1
train.ulysses_parallel_size int Ulysses sequence parallel size (currently only supported for P6dense and Qwen2VL). 1
train.context_parallel_size int Ring sequence parallel size (currently not supported) 1
train.expert_parallel_size int Expert parallel size (currently only supported DeepseekMOE) 1
train.load_checkpoint_path str Path to the omnistore checkpoint for resuming training. None
train.save_steps int Number of steps between two checkpoint saves. 0 means invalid. 0
train.save_epochs int Number of epochs between two checkpoint saves. 0 means invalid. 1
train.save_hf_weights bool Whether to save the model weights in the huggingface format. It is recommended to set it to False for models > 30B to prevent NCCL timeout. You can convert it after training. True
train.seed int Random seed. 42
train.use_wandb bool Whether to enable byted wandb experiment logging. True
train.wandb_project str Name of the wandb experiment project. LingBotVLA
train.wandb_name str Name of the wandb experiment. None
train.enable_profiling bool Whether to use torch profiling. False
train.profile_start_step int Starting step of profiling. 1
train.profile_end_step int Ending step of profiling. 2
train.profile_trace_dir str Path to save the profiling results. ./trace
train.profile_record_shapes bool Whether to record the shapes of the input tensors. True
train.profile_profile_memory bool Whether to record the memory usage. True
train.profile_with_stack bool Whether to record the stack information. True
train.max_steps int Number of steps per training epoch (only used for debugging). None

Inference configuration arguments

Name Type Description Default Value
infer.model_path str Path to the model parameter file. Required
infer.tokenizer_path str Path to the tokenizer. model.model_path
infer.seed int Random seed. 42
infer.do_sample bool Whether to enable sampling. True
infer.temperature float Sampling temperature. 1.0
infer.top_p float Sampling Top P value. 1.0
infer.max_tokens int Maximum number of tokens generated each time. 1024