{ "model": "Qwen/Qwen2.5-VL-7B-Instruct", "model_type": "qwen2_5_vl", "model_revision": null, "task_type": "causal_lm", "torch_dtype": "bfloat16", "attn_impl": null, "num_labels": null, "problem_type": null, "rope_scaling": null, "device_map": null, "max_memory": {}, "local_repo_path": null, "init_strategy": null, "template": "qwen2_5_vl", "system": "You are a Multifaceted Mobile Interface Assistant. Your responsibilities include:\n\n- 1. Navigating a mobile phone interface to reach a target page based on user instructions, task history, and the current screen state.\n- 2. Understanding icons by identifying their name or function based on their location on the screen.\n- 3. Grounding icons by locating the coordinates of an icon based on its name or description.\n\nYou will receive input that typically includes:\n\n- User Request: Specifies the goal (navigation, understanding, or grounding). This might be a complex instruction for navigation or a direct question/command for icon tasks.\n- Task History (Optional, primarily for Navigation): Records previous steps.\n- Current Screen State: Represents the current screen, an image (indicated by ).\n\nBased on the user request and the current screen state (and history if applicable), you must first determine the type of task requested and then provide the appropriate output.\n\n--- Task Types and Output Formats ---\n\n1. Task: Navigation\n\n- Goal: Reach a target page step-by-step.\n- Typical Input: Multi-turn instruction, history, and state. screen description and screenshot.\n- Possible Actions:\n - click: Tap a specific element. Provide coordinates (x, y) relative to a (0,0) top-left and (1000,1000) bottom-right system.\n - complete: Task finished, current screen is the target.\n- Output Format:\nExplain: [Your brief explanation, e.g., 'click xxx icon on yyy page.', 'this is the target page.']\tAction: [click(start_box=<|box_start|>(x,y)<|box_end|>) or complete]\n\n2. Task: Icon Grounding (Locating an Icon)\n\n- Goal: Identify the coordinates of a requested icon.\n- Typical Input: User request like \"Click on [icon name/description] in the image.\", screen image ().\n- Action: Implicitly click (meaning \"identify location\").\n- Output Format:\nAction: click(start_box=<|box_start|>(x,y)<|box_end|>)\n\n3. Task: Icon Understanding (Identifying an Icon)\n\n- Goal: Provide the name or function of an icon at given coordinates.\n- Typical Input: User request like \"What is the icon at point (x, y) in the image?\", screen image ().\n- Action: Provide textual information.\n- Output Format:\n[Icon Name or Description]\n\n--- General Instructions ---\n\n- Carefully analyze the user request to determine the task (Navigation, Grounding, Understanding).\n- Analyze the current screen state (description or image) thoroughly.\n- For actions involving coordinates (click), use the (0,0) to (1000,1000) system.\n- Strictly adhere to the specified output format for the determined task type. Use a tab character (\\t) as a separator where indicated.", "max_length": 2048, "truncation_strategy": "delete", "max_pixels": 200704, "agent_template": null, "norm_bbox": null, "use_chat_template": true, "padding_free": false, "padding_side": "right", "loss_scale": "default", "sequence_parallel_size": 1, "response_prefix": null, "template_backend": "swift", "dataset": [ "datas/sft_aligned.json" ], "val_dataset": [], "split_dataset_ratio": 0.01, "data_seed": 42, "dataset_num_proc": 4, "load_from_cache_file": true, "dataset_shuffle": true, "val_dataset_shuffle": false, "streaming": false, "interleave_prob": null, "stopping_strategy": "first_exhausted", "shuffle_buffer_size": 1000, "download_mode": "reuse_dataset_if_exists", "columns": {}, "strict": false, "remove_unused_columns": true, "model_name": [ null, null ], "model_author": [ null, null ], "custom_dataset_info": [], "quant_method": null, "quant_bits": null, "hqq_axis": null, "bnb_4bit_compute_dtype": "bfloat16", "bnb_4bit_quant_type": "nf4", "bnb_4bit_use_double_quant": true, "bnb_4bit_quant_storage": null, "max_new_tokens": 64, "temperature": 0.0, "top_k": null, "top_p": null, "repetition_penalty": null, "num_beams": 1, "stream": false, "stop_words": [], "logprobs": false, "top_logprobs": null, "ckpt_dir": null, "lora_modules": [], "tuner_backend": "peft", "train_type": "full", "adapters": [], "external_plugins": [], "seed": 42, "model_kwargs": {}, "load_args": false, "load_data_args": false, "use_hf": true, "hub_token": null, "custom_register_path": [], "ddp_timeout": 1800, "ddp_backend": null, "ignore_args_error": false, "use_swift_lora": false, "output_dir": "/ext_hdd2/nhkoh/gelab-env/checkpoint/gui_exp/sft_448/v0-20260221_074940", "overwrite_output_dir": false, "do_train": false, "do_eval": false, "do_predict": false, "eval_strategy": "steps", "prediction_loss_only": false, "per_device_train_batch_size": 2, "per_device_eval_batch_size": 2, "per_gpu_train_batch_size": null, "per_gpu_eval_batch_size": null, "gradient_accumulation_steps": 4, "eval_accumulation_steps": null, "eval_delay": 0, "torch_empty_cache_steps": null, "learning_rate": 1e-05, "weight_decay": 0.1, "adam_beta1": 0.9, "adam_beta2": 0.95, "adam_epsilon": 1e-08, "max_grad_norm": 1.0, "num_train_epochs": 1.0, "max_steps": -1, "lr_scheduler_type": "cosine", "lr_scheduler_kwargs": null, "warmup_ratio": 0.05, "warmup_steps": 0, "log_level": "passive", "log_level_replica": "warning", "log_on_each_node": true, "logging_dir": "/ext_hdd2/nhkoh/gelab-env/checkpoint/gui_exp/sft_448/v0-20260221_074940/runs", "logging_strategy": "steps", "logging_first_step": true, "logging_steps": 10, "logging_nan_inf_filter": true, "save_strategy": "steps", "save_steps": 500.0, "save_total_limit": 2, "save_safetensors": true, "save_on_each_node": false, "save_only_model": true, "restore_callback_states_from_checkpoint": false, "no_cuda": false, "use_cpu": false, "use_mps_device": false, "jit_mode_eval": false, "bf16": true, "fp16": false, "fp16_opt_level": "O1", "half_precision_backend": "auto", "bf16_full_eval": false, "fp16_full_eval": false, "tf32": null, "local_rank": 0, "tpu_num_cores": null, "tpu_metrics_debug": false, "debug": null, "dataloader_drop_last": false, "eval_steps": 500.0, "dataloader_num_workers": 4, "dataloader_prefetch_factor": null, "past_index": -1, "run_name": "/ext_hdd2/nhkoh/gelab-env/checkpoint/gui_exp/sft_448/v0-20260221_074940", "disable_tqdm": null, "label_names": null, "load_best_model_at_end": false, "metric_for_best_model": "loss", "greater_is_better": false, "ignore_data_skip": false, "fsdp": null, "fsdp_min_num_params": 0, "fsdp_config": null, "fsdp_transformer_layer_cls_to_wrap": null, "accelerator_config": { "dispatch_batches": false }, "parallelism_config": null, "deepspeed": { "fp16": { "enabled": "auto", "loss_scale": 0, "loss_scale_window": 1000, "initial_scale_power": 16, "hysteresis": 2, "min_loss_scale": 1 }, "bf16": { "enabled": "auto" }, "zero_optimization": { "stage": 2, "offload_optimizer": { "device": "none", "pin_memory": true }, "allgather_partitions": true, "allgather_bucket_size": 200000000.0, "overlap_comm": false, "reduce_scatter": true, "reduce_bucket_size": 200000000.0, "contiguous_gradients": true }, "gradient_accumulation_steps": "auto", "gradient_clipping": "auto", "steps_per_print": 2000, "train_batch_size": "auto", "train_micro_batch_size_per_gpu": "auto", "wall_clock_breakdown": false }, "label_smoothing_factor": 0.0, "optim": "adamw_torch_fused", "optim_args": null, "adafactor": false, "group_by_length": false, "length_column_name": "length", "report_to": [ "wandb" ], "project": "huggingface", "trackio_space_id": "trackio", "ddp_find_unused_parameters": null, "ddp_bucket_cap_mb": null, "ddp_broadcast_buffers": null, "dataloader_pin_memory": true, "dataloader_persistent_workers": false, "skip_memory_metrics": true, "use_legacy_prediction_loop": false, "push_to_hub": false, "resume_from_checkpoint": null, "hub_model_id": null, "hub_strategy": "every_save", "hub_private_repo": null, "hub_always_push": false, "hub_revision": null, "gradient_checkpointing": false, "gradient_checkpointing_kwargs": null, "include_inputs_for_metrics": false, "include_for_metrics": [], "eval_do_concat_batches": true, "fp16_backend": "auto", "push_to_hub_model_id": null, "push_to_hub_organization": null, "push_to_hub_token": null, "mp_parameters": "", "auto_find_batch_size": false, "full_determinism": false, "torchdynamo": null, "ray_scope": "last", "torch_compile": false, "torch_compile_backend": null, "torch_compile_mode": null, "include_tokens_per_second": false, "include_num_input_tokens_seen": false, "neftune_noise_alpha": null, "optim_target_modules": null, "batch_eval_metrics": false, "eval_on_start": false, "use_liger_kernel": false, "liger_kernel_config": null, "eval_use_gather_object": false, "average_tokens_across_devices": true, "sortish_sampler": false, "predict_with_generate": false, "generation_max_length": null, "generation_num_beams": null, "generation_config": null, "check_model": true, "acc_strategy": "token", "train_dataloader_shuffle": true, "max_epochs": null, "aligner_lr": null, "vit_lr": null, "optimizer": null, "metric_warmup_step": 0, "fsdp_num": 1, "acc_steps": 1, "eval_use_evalscope": false, "eval_datasets": [], "eval_limit": null, "eval_datasets_args": null, "eval_generation_config": null, "freeze_parameters": [ "visual", "visual.merger" ], "freeze_parameters_regex": null, "freeze_parameters_ratio": 0.0, "trainable_parameters": [], "trainable_parameters_regex": null, "freeze_llm": false, "freeze_vit": true, "freeze_aligner": true, "target_modules": [ "all-linear" ], "target_regex": null, "modules_to_save": [], "lora_rank": 8, "lora_alpha": 32, "lora_dropout": 0.05, "lora_bias": "none", "lora_dtype": null, "lorap_lr_ratio": null, "use_rslora": false, "use_dora": false, "lora_ga_batch_size": 2, "lora_ga_iters": 2, "lora_ga_max_length": 1024, "lora_ga_direction": "ArB2r", "lora_ga_scale": "stable", "lora_ga_stable_gamma": 16, "init_weights": true, "fourier_n_frequency": 2000, "fourier_scaling": 300.0, "boft_block_size": 4, "boft_block_num": 0, "boft_n_butterfly_factor": 1, "boft_dropout": 0.0, "vera_rank": 256, "vera_projection_prng_key": 0, "vera_dropout": 0.0, "vera_d_initial": 0.1, "adapter_act": "gelu", "adapter_length": 128, "use_galore": false, "galore_target_modules": null, "galore_rank": 128, "galore_update_proj_gap": 50, "galore_scale": 1.0, "galore_proj_type": "std", "galore_optim_per_parameter": false, "galore_with_embedding": false, "galore_quantization": false, "galore_proj_quant": false, "galore_proj_bits": 4, "galore_proj_group_size": 256, "galore_cos_threshold": 0.4, "galore_gamma_proj": 2, "galore_queue_size": 5, "adalora_target_r": 8, "adalora_init_r": 12, "adalora_tinit": 0, "adalora_tfinal": 0, "adalora_deltaT": 1, "adalora_beta1": 0.85, "adalora_beta2": 0.85, "adalora_orth_reg_weight": 0.5, "llamapro_num_new_blocks": 4, "llamapro_num_groups": null, "lisa_activated_layers": 0, "lisa_step_interval": 20, "reft_layer_key": null, "reft_layers": null, "reft_rank": 4, "reft_intervention_type": "LoreftIntervention", "reft_args": null, "swanlab_token": null, "swanlab_project": null, "swanlab_workspace": null, "swanlab_exp_name": null, "swanlab_mode": "cloud", "add_version": false, "resume_only_model": false, "create_checkpoint_symlink": false, "packing": false, "lazy_tokenize": true, "loss_type": null, "metric": null, "zero_hpz_partition_size": null, "rank": 0, "global_world_size": 3, "local_world_size": 3, "model_suffix": "Qwen2.5-VL-7B-Instruct", "model_info": "ModelInfo(model_type='qwen2_5_vl', model_dir='/ext_hdd2/nhkoh/.cache/huggingface/hub/models--Qwen--Qwen2.5-VL-7B-Instruct/snapshots/cc594898137f460bfe9f0759e9844b3ce807cfb5', torch_dtype=torch.bfloat16, max_model_len=128000, quant_method=None, quant_bits=None, rope_scaling={'type': 'default', 'mrope_section': [16, 24, 24], 'rope_type': 'default'}, config=None, task_type='causal_lm', num_labels=None)", "model_meta": "ModelMeta(model_type='qwen2_5_vl', model_groups=[ModelGroup(models=[Model(ms_model_id='Qwen/Qwen2.5-VL-3B-Instruct', hf_model_id='Qwen/Qwen2.5-VL-3B-Instruct', model_path=None, ms_revision=None, hf_revision=None), Model(ms_model_id='Qwen/Qwen2.5-VL-7B-Instruct', hf_model_id='Qwen/Qwen2.5-VL-7B-Instruct', model_path=None, ms_revision=None, hf_revision=None), Model(ms_model_id='Qwen/Qwen2.5-VL-32B-Instruct', hf_model_id='Qwen/Qwen2.5-VL-32B-Instruct', model_path=None, ms_revision=None, hf_revision=None), Model(ms_model_id='Qwen/Qwen2.5-VL-72B-Instruct', hf_model_id='Qwen/Qwen2.5-VL-72B-Instruct', model_path=None, ms_revision=None, hf_revision=None)], ignore_patterns=None, requires=None, tags=[]), ModelGroup(models=[Model(ms_model_id='Qwen/Qwen2.5-VL-3B-Instruct-AWQ', hf_model_id='Qwen/Qwen2.5-VL-3B-Instruct-AWQ', model_path=None, ms_revision=None, hf_revision=None), Model(ms_model_id='Qwen/Qwen2.5-VL-7B-Instruct-AWQ', hf_model_id='Qwen/Qwen2.5-VL-7B-Instruct-AWQ', model_path=None, ms_revision=None, hf_revision=None), Model(ms_model_id='Qwen/Qwen2.5-VL-32B-Instruct-AWQ', hf_model_id='Qwen/Qwen2.5-VL-32B-Instruct-AWQ', model_path=None, ms_revision=None, hf_revision=None), Model(ms_model_id='Qwen/Qwen2.5-VL-72B-Instruct-AWQ', hf_model_id='Qwen/Qwen2.5-VL-72B-Instruct-AWQ', model_path=None, ms_revision=None, hf_revision=None)], ignore_patterns=None, requires=None, tags=[])], template='qwen2_5_vl', get_function=, model_arch='qwen2_vl', architectures=['Qwen2_5_VLForConditionalGeneration'], additional_saved_files=[], torch_dtype=None, is_multimodal=True, is_reward=False, task_type=None, ignore_patterns=None, requires=['transformers>=4.49', 'qwen_vl_utils>=0.0.6', 'decord'], tags=[])", "model_dir": "/ext_hdd2/nhkoh/.cache/huggingface/hub/models--Qwen--Qwen2.5-VL-7B-Instruct/snapshots/cc594898137f460bfe9f0759e9844b3ce807cfb5", "hub": "", "evaluation_strategy": "steps", "training_args": "Seq2SeqTrainingArguments(output_dir='/ext_hdd2/nhkoh/gelab-env/checkpoint/gui_exp/sft_448/v0-20260221_074940', overwrite_output_dir=False, do_train=False, do_eval=True, do_predict=False, eval_strategy=, prediction_loss_only=False, per_device_train_batch_size=2, per_device_eval_batch_size=2, per_gpu_train_batch_size=None, per_gpu_eval_batch_size=None, gradient_accumulation_steps=4, eval_accumulation_steps=None, eval_delay=0, torch_empty_cache_steps=None, learning_rate=1e-05, weight_decay=0.1, adam_beta1=0.9, adam_beta2=0.95, adam_epsilon=1e-08, max_grad_norm=1.0, num_train_epochs=1.0, max_steps=-1, lr_scheduler_type=, lr_scheduler_kwargs=None, warmup_ratio=0.05, warmup_steps=0, log_level='passive', log_level_replica='warning', log_on_each_node=True, logging_dir='/ext_hdd2/nhkoh/gelab-env/checkpoint/gui_exp/sft_448/v0-20260221_074940/runs', logging_strategy=, logging_first_step=True, logging_steps=10, logging_nan_inf_filter=True, save_strategy=, save_steps=500, save_total_limit=2, save_safetensors=True, save_on_each_node=False, save_only_model=True, restore_callback_states_from_checkpoint=False, no_cuda=False, use_cpu=False, use_mps_device=False, seed=42, data_seed=42, jit_mode_eval=False, bf16=True, fp16=False, fp16_opt_level='O1', half_precision_backend='auto', bf16_full_eval=False, fp16_full_eval=False, tf32=None, local_rank=0, ddp_backend=None, tpu_num_cores=None, tpu_metrics_debug=False, debug=[], dataloader_drop_last=False, eval_steps=500, dataloader_num_workers=4, dataloader_prefetch_factor=10, past_index=-1, run_name='/ext_hdd2/nhkoh/gelab-env/checkpoint/gui_exp/sft_448/v0-20260221_074940', disable_tqdm=False, remove_unused_columns=False, label_names=None, load_best_model_at_end=False, metric_for_best_model='loss', greater_is_better=False, ignore_data_skip=False, fsdp=[], fsdp_min_num_params=0, fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_transformer_layer_cls_to_wrap=None, accelerator_config=AcceleratorConfig(split_batches=False, dispatch_batches=False, even_batches=True, use_seedable_sampler=True, non_blocking=False, gradient_accumulation_kwargs=None, use_configured_state=False), parallelism_config=None, deepspeed={'fp16': {'enabled': 'auto', 'loss_scale': 0, 'loss_scale_window': 1000, 'initial_scale_power': 16, 'hysteresis': 2, 'min_loss_scale': 1}, 'bf16': {'enabled': 'auto'}, 'zero_optimization': {'stage': 2, 'offload_optimizer': {'device': 'none', 'pin_memory': True}, 'allgather_partitions': True, 'allgather_bucket_size': 200000000.0, 'overlap_comm': False, 'reduce_scatter': True, 'reduce_bucket_size': 200000000.0, 'contiguous_gradients': True}, 'gradient_accumulation_steps': 'auto', 'gradient_clipping': 'auto', 'steps_per_print': 2000, 'train_batch_size': 'auto', 'train_micro_batch_size_per_gpu': 'auto', 'wall_clock_breakdown': False}, label_smoothing_factor=0.0, optim=, optim_args=None, adafactor=False, group_by_length=False, length_column_name='length', report_to=['wandb'], project='huggingface', trackio_space_id='trackio', ddp_find_unused_parameters=None, ddp_bucket_cap_mb=None, ddp_broadcast_buffers=None, dataloader_pin_memory=True, dataloader_persistent_workers=False, skip_memory_metrics=True, use_legacy_prediction_loop=False, push_to_hub=False, resume_from_checkpoint=None, hub_model_id=None, hub_strategy=, hub_token=None, hub_private_repo=None, hub_always_push=False, hub_revision=None, gradient_checkpointing=False, gradient_checkpointing_kwargs=None, include_inputs_for_metrics=False, include_for_metrics=[], eval_do_concat_batches=True, fp16_backend='auto', push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=None, mp_parameters='', auto_find_batch_size=False, full_determinism=False, torchdynamo=None, ray_scope='last', ddp_timeout=1800, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, include_tokens_per_second=None, include_num_input_tokens_seen=None, neftune_noise_alpha=None, optim_target_modules=None, batch_eval_metrics=False, eval_on_start=False, use_liger_kernel=False, liger_kernel_config=None, eval_use_gather_object=False, average_tokens_across_devices=None, sortish_sampler=False, predict_with_generate=False, generation_max_length=None, generation_num_beams=None, generation_config=None, check_model=True, acc_strategy='token', train_dataloader_shuffle=True, max_epochs=None, aligner_lr=None, vit_lr=None, optimizer=None, metric_warmup_step=0, fsdp_num=1, acc_steps=1, eval_use_evalscope=False, eval_datasets=[], eval_limit=None, eval_datasets_args=None, eval_generation_config=None, train_type='full', local_repo_path=None, galore_config=None)" }