{ "model": "Qwen/Qwen2.5-VL-7B-Instruct", "model_type": "qwen2_5_vl", "model_revision": null, "task_type": "causal_lm", "torch_dtype": "bfloat16", "attn_impl": null, "num_labels": null, "problem_type": null, "rope_scaling": null, "device_map": null, "max_memory": {}, "local_repo_path": null, "init_strategy": null, "template": "qwen2_5_vl", "system": "You are a Multifaceted Mobile Interface Assistant. Your responsibilities include:\n\n- 1. Navigating a mobile phone interface to reach a target page based on user instructions, task history, and the current screen state.\n- 2. Understanding icons by identifying their name or function based on their location on the screen.\n- 3. Grounding icons by locating the coordinates of an icon based on its name or description.\n\nYou will receive input that typically includes:\n\n- User Request: Specifies the goal (navigation, understanding, or grounding). This might be a complex instruction for navigation or a direct question/command for icon tasks.\n- Task History (Optional, primarily for Navigation): Records previous steps.\n- Current Screen State: Represents the current screen, an image (indicated by ).\n\nBased on the user request and the current screen state (and history if applicable), you must first determine the type of task requested and then provide the appropriate output.\n\n--- Task Types and Output Formats ---\n\n1. Task: Navigation\n\n- Goal: Complete a task on a mobile phone step-by-step using the available actions.\n- Typical Input: Multi-turn instruction, history, and state. screen description and screenshot.\n- Available Actions (AMEX unified action space):\n - TAP: Tap a specific element. Provide coordinates (x, y) in absolute pixel values based on the input screen resolution (1080x2400), where (0,0) is the top-left corner.\n - SWIPE: Drag/scroll from one point to another. Provide start and end coordinates.\n - TYPE: Enter text at a location. Provide coordinates and the text string.\n - PRESS_ENTER: Submit or confirm the current input.\n - PRESS_BACK: Press the system back button to return to the previous screen.\n - PRESS_HOME: Press the system home button to return to the home screen.\n - TASK_COMPLETE: Task finished successfully, current screen is the target.\n - TASK_IMPOSSIBLE: Task cannot be completed from the current state.\n- Output Format:\nExplain: [Your brief explanation]\tAction: [action format below]\n\n- Action Formats:\n - TAP: tap(start_box='<|box_start|>(x,y)<|box_end|>')\n - SWIPE: swipe(start_box='<|box_start|>(x1,y1)<|box_end|>', end_box='<|box_start|>(x2,y2)<|box_end|>')\n - TYPE: type(start_box='<|box_start|>(x,y)<|box_end|>', text='...')\n - PRESS_ENTER: press_enter()\n - PRESS_BACK: press_back()\n - PRESS_HOME: press_home()\n - TASK_COMPLETE: complete\n - TASK_IMPOSSIBLE: impossible\n\n2. Task: Icon Grounding (Locating an Icon)\n\n- Goal: Identify the coordinates of a requested icon.\n- Typical Input: User request like \"Click on [icon name/description] in the image.\", screen image ().\n- Output Format:\nAction: tap(start_box='<|box_start|>(x,y)<|box_end|>')\n\n3. Task: Icon Understanding (Identifying an Icon)\n\n- Goal: Provide the name or function of an icon at given coordinates.\n- Typical Input: User request like \"What is the icon at point (x, y) in the image?\", screen image ().\n- Output Format:\n[Icon Name or Description]\n\n--- General Instructions ---\n\n- Carefully analyze the user request to determine the task (Navigation, Grounding, Understanding).\n- Analyze the current screen state (description or image) thoroughly.\n- For actions involving coordinates (TAP, SWIPE, TYPE), use absolute pixel coordinates based on the input screen resolution (1080x2400), where (0,0) is the top-left corner.\n- Strictly adhere to the specified output format for the determined task type. Use a tab character (\\t) as a separator where indicated.", "max_length": 8192, "truncation_strategy": "delete", "max_pixels": 2629536, "agent_template": null, "norm_bbox": null, "use_chat_template": true, "padding_free": false, "padding_side": "right", "loss_scale": "default", "sequence_parallel_size": 1, "response_prefix": null, "template_backend": "swift", "dataset": [ "/data/datas/sft_amex.json" ], "val_dataset": [], "split_dataset_ratio": 0.01, "data_seed": 42, "dataset_num_proc": 4, "load_from_cache_file": true, "dataset_shuffle": true, "val_dataset_shuffle": false, "streaming": false, "interleave_prob": null, "stopping_strategy": "first_exhausted", "shuffle_buffer_size": 1000, "download_mode": "reuse_dataset_if_exists", "columns": {}, "strict": false, "remove_unused_columns": true, "model_name": [ null, null ], "model_author": [ null, null ], "custom_dataset_info": [], "quant_method": null, "quant_bits": null, "hqq_axis": null, "bnb_4bit_compute_dtype": "bfloat16", "bnb_4bit_quant_type": "nf4", "bnb_4bit_use_double_quant": true, "bnb_4bit_quant_storage": null, "max_new_tokens": 64, "temperature": 0.0, "top_k": null, "top_p": null, "repetition_penalty": null, "num_beams": 1, "stream": false, "stop_words": [], "logprobs": false, "top_logprobs": null, "ckpt_dir": null, "lora_modules": [], "tuner_backend": "peft", "train_type": "full", "adapters": [], "external_plugins": [], "seed": 42, "model_kwargs": {}, "load_args": false, "load_data_args": false, "use_hf": true, "hub_token": null, "custom_register_path": [], "ddp_timeout": 1800, "ddp_backend": null, "ignore_args_error": false, "use_swift_lora": false, "output_dir": "/workspace/checkpoint/gui_exp/sft_amex/v0-20260413_084132", "overwrite_output_dir": false, "do_train": false, "do_eval": false, "do_predict": false, "eval_strategy": "steps", "prediction_loss_only": false, "per_device_train_batch_size": 4, "per_device_eval_batch_size": 4, "per_gpu_train_batch_size": null, "per_gpu_eval_batch_size": null, "gradient_accumulation_steps": 4, "eval_accumulation_steps": null, "eval_delay": 0, "torch_empty_cache_steps": null, "learning_rate": 1e-05, "weight_decay": 0.1, "adam_beta1": 0.9, "adam_beta2": 0.95, "adam_epsilon": 1e-08, "max_grad_norm": 1.0, "num_train_epochs": 1.0, "max_steps": -1, "lr_scheduler_type": "cosine", "lr_scheduler_kwargs": null, "warmup_ratio": 0.05, "warmup_steps": 0, "log_level": "passive", "log_level_replica": "warning", "log_on_each_node": true, "logging_dir": "/workspace/checkpoint/gui_exp/sft_amex/v0-20260413_084132/runs", "logging_strategy": "steps", "logging_first_step": true, "logging_steps": 1, "logging_nan_inf_filter": true, "save_strategy": "steps", "save_steps": 618.0, "save_total_limit": 2, "save_safetensors": true, "save_on_each_node": false, "save_only_model": true, "restore_callback_states_from_checkpoint": false, "no_cuda": false, "use_cpu": false, "use_mps_device": false, "jit_mode_eval": false, "use_ipex": false, "bf16": true, "fp16": false, "fp16_opt_level": "O1", "half_precision_backend": "auto", "bf16_full_eval": false, "fp16_full_eval": false, "tf32": null, "local_rank": 0, "tpu_num_cores": null, "tpu_metrics_debug": false, "debug": null, "dataloader_drop_last": false, "eval_steps": 500.0, "dataloader_num_workers": 4, "dataloader_prefetch_factor": null, "past_index": -1, "run_name": "/workspace/checkpoint/gui_exp/sft_amex/v0-20260413_084132", "disable_tqdm": null, "label_names": null, "load_best_model_at_end": false, "metric_for_best_model": "loss", "greater_is_better": false, "ignore_data_skip": false, "fsdp": "", "fsdp_min_num_params": 0, "fsdp_config": null, "fsdp_transformer_layer_cls_to_wrap": null, "accelerator_config": { "dispatch_batches": false }, "deepspeed": { "fp16": { "enabled": "auto", "loss_scale": 0, "loss_scale_window": 1000, "initial_scale_power": 16, "hysteresis": 2, "min_loss_scale": 1 }, "bf16": { "enabled": "auto" }, "zero_optimization": { "stage": 2, "offload_optimizer": { "device": "none", "pin_memory": true }, "allgather_partitions": true, "allgather_bucket_size": 200000000.0, "overlap_comm": false, "reduce_scatter": true, "reduce_bucket_size": 200000000.0, "contiguous_gradients": true }, "gradient_accumulation_steps": "auto", "gradient_clipping": "auto", "steps_per_print": 2000, "train_batch_size": "auto", "train_micro_batch_size_per_gpu": "auto", "wall_clock_breakdown": false }, "label_smoothing_factor": 0.0, "optim": "adamw_torch", "optim_args": null, "adafactor": false, "group_by_length": false, "length_column_name": "length", "report_to": [ "wandb" ], "ddp_find_unused_parameters": null, "ddp_bucket_cap_mb": null, "ddp_broadcast_buffers": null, "dataloader_pin_memory": true, "dataloader_persistent_workers": false, "skip_memory_metrics": true, "use_legacy_prediction_loop": false, "push_to_hub": false, "resume_from_checkpoint": null, "hub_model_id": null, "hub_strategy": "every_save", "hub_private_repo": null, "hub_always_push": false, "gradient_checkpointing": true, "gradient_checkpointing_kwargs": null, "include_inputs_for_metrics": false, "include_for_metrics": [], "eval_do_concat_batches": true, "fp16_backend": "auto", "push_to_hub_model_id": null, "push_to_hub_organization": null, "push_to_hub_token": null, "mp_parameters": "", "auto_find_batch_size": false, "full_determinism": false, "torchdynamo": null, "ray_scope": "last", "torch_compile": false, "torch_compile_backend": null, "torch_compile_mode": null, "include_tokens_per_second": false, "include_num_input_tokens_seen": false, "neftune_noise_alpha": null, "optim_target_modules": null, "batch_eval_metrics": false, "eval_on_start": false, "use_liger_kernel": false, "eval_use_gather_object": false, "average_tokens_across_devices": false, "sortish_sampler": false, "predict_with_generate": false, "generation_max_length": null, "generation_num_beams": null, "generation_config": null, "check_model": true, "acc_strategy": "token", "train_dataloader_shuffle": true, "max_epochs": null, "aligner_lr": null, "vit_lr": null, "optimizer": null, "metric_warmup_step": 0, "fsdp_num": 1, "acc_steps": 1, "eval_use_evalscope": false, "eval_datasets": [], "eval_limit": null, "eval_datasets_args": null, "eval_generation_config": null, "freeze_parameters": [ "visual", "visual.merger" ], "freeze_parameters_regex": "^(model\\.)?visual\\.", "freeze_parameters_ratio": 0.0, "trainable_parameters": [], "trainable_parameters_regex": null, "freeze_llm": false, "freeze_vit": true, "freeze_aligner": true, "target_modules": [ "all-linear" ], "target_regex": null, "modules_to_save": [], "lora_rank": 8, "lora_alpha": 32, "lora_dropout": 0.05, "lora_bias": "none", "lora_dtype": null, "lorap_lr_ratio": null, "use_rslora": false, "use_dora": false, "lora_ga_batch_size": 2, "lora_ga_iters": 2, "lora_ga_max_length": 1024, "lora_ga_direction": "ArB2r", "lora_ga_scale": "stable", "lora_ga_stable_gamma": 16, "init_weights": true, "fourier_n_frequency": 2000, "fourier_scaling": 300.0, "boft_block_size": 4, "boft_block_num": 0, "boft_n_butterfly_factor": 1, "boft_dropout": 0.0, "vera_rank": 256, "vera_projection_prng_key": 0, "vera_dropout": 0.0, "vera_d_initial": 0.1, "adapter_act": "gelu", "adapter_length": 128, "use_galore": false, "galore_target_modules": null, "galore_rank": 128, "galore_update_proj_gap": 50, "galore_scale": 1.0, "galore_proj_type": "std", "galore_optim_per_parameter": false, "galore_with_embedding": false, "galore_quantization": false, "galore_proj_quant": false, "galore_proj_bits": 4, "galore_proj_group_size": 256, "galore_cos_threshold": 0.4, "galore_gamma_proj": 2, "galore_queue_size": 5, "adalora_target_r": 8, "adalora_init_r": 12, "adalora_tinit": 0, "adalora_tfinal": 0, "adalora_deltaT": 1, "adalora_beta1": 0.85, "adalora_beta2": 0.85, "adalora_orth_reg_weight": 0.5, "llamapro_num_new_blocks": 4, "llamapro_num_groups": null, "lisa_activated_layers": 0, "lisa_step_interval": 20, "reft_layer_key": null, "reft_layers": null, "reft_rank": 4, "reft_intervention_type": "LoreftIntervention", "reft_args": null, "swanlab_token": null, "swanlab_project": null, "swanlab_workspace": null, "swanlab_exp_name": null, "swanlab_mode": "cloud", "add_version": false, "resume_only_model": false, "create_checkpoint_symlink": false, "packing": false, "lazy_tokenize": true, "loss_type": null, "metric": null, "zero_hpz_partition_size": null, "rank": 0, "global_world_size": 8, "local_world_size": 8, "model_suffix": "Qwen2.5-VL-7B-Instruct", "model_info": "ModelInfo(model_type='qwen2_5_vl', model_dir='/data/.cache/huggingface/hub/models--Qwen--Qwen2.5-VL-7B-Instruct/snapshots/cc594898137f460bfe9f0759e9844b3ce807cfb5', torch_dtype=torch.bfloat16, max_model_len=128000, quant_method=None, quant_bits=None, rope_scaling={'type': 'default', 'mrope_section': [16, 24, 24], 'rope_type': 'default'}, config=None, task_type='causal_lm', num_labels=None)", "model_meta": "ModelMeta(model_type='qwen2_5_vl', model_groups=[ModelGroup(models=[Model(ms_model_id='Qwen/Qwen2.5-VL-3B-Instruct', hf_model_id='Qwen/Qwen2.5-VL-3B-Instruct', model_path=None, ms_revision=None, hf_revision=None), Model(ms_model_id='Qwen/Qwen2.5-VL-7B-Instruct', hf_model_id='Qwen/Qwen2.5-VL-7B-Instruct', model_path=None, ms_revision=None, hf_revision=None), Model(ms_model_id='Qwen/Qwen2.5-VL-32B-Instruct', hf_model_id='Qwen/Qwen2.5-VL-32B-Instruct', model_path=None, ms_revision=None, hf_revision=None), Model(ms_model_id='Qwen/Qwen2.5-VL-72B-Instruct', hf_model_id='Qwen/Qwen2.5-VL-72B-Instruct', model_path=None, ms_revision=None, hf_revision=None)], ignore_patterns=None, requires=None, tags=[]), ModelGroup(models=[Model(ms_model_id='Qwen/Qwen2.5-VL-3B-Instruct-AWQ', hf_model_id='Qwen/Qwen2.5-VL-3B-Instruct-AWQ', model_path=None, ms_revision=None, hf_revision=None), Model(ms_model_id='Qwen/Qwen2.5-VL-7B-Instruct-AWQ', hf_model_id='Qwen/Qwen2.5-VL-7B-Instruct-AWQ', model_path=None, ms_revision=None, hf_revision=None), Model(ms_model_id='Qwen/Qwen2.5-VL-32B-Instruct-AWQ', hf_model_id='Qwen/Qwen2.5-VL-32B-Instruct-AWQ', model_path=None, ms_revision=None, hf_revision=None), Model(ms_model_id='Qwen/Qwen2.5-VL-72B-Instruct-AWQ', hf_model_id='Qwen/Qwen2.5-VL-72B-Instruct-AWQ', model_path=None, ms_revision=None, hf_revision=None)], ignore_patterns=None, requires=None, tags=[])], template='qwen2_5_vl', get_function=, model_arch='qwen2_vl', architectures=['Qwen2_5_VLForConditionalGeneration'], additional_saved_files=[], torch_dtype=None, is_multimodal=True, is_reward=False, task_type=None, ignore_patterns=None, requires=['transformers>=4.49', 'qwen_vl_utils>=0.0.6', 'decord'], tags=[])", "model_dir": "/data/.cache/huggingface/hub/models--Qwen--Qwen2.5-VL-7B-Instruct/snapshots/cc594898137f460bfe9f0759e9844b3ce807cfb5", "hub": "", "evaluation_strategy": "steps", "training_args": "Seq2SeqTrainingArguments(output_dir='/workspace/checkpoint/gui_exp/sft_amex/v0-20260413_084132', overwrite_output_dir=False, do_train=False, do_eval=True, do_predict=False, eval_strategy=, prediction_loss_only=False, per_device_train_batch_size=4, per_device_eval_batch_size=4, per_gpu_train_batch_size=None, per_gpu_eval_batch_size=None, gradient_accumulation_steps=4, eval_accumulation_steps=None, eval_delay=0, torch_empty_cache_steps=None, learning_rate=1e-05, weight_decay=0.1, adam_beta1=0.9, adam_beta2=0.95, adam_epsilon=1e-08, max_grad_norm=1.0, num_train_epochs=1.0, max_steps=-1, lr_scheduler_type=, lr_scheduler_kwargs=None, warmup_ratio=0.05, warmup_steps=0, log_level='passive', log_level_replica='warning', log_on_each_node=True, logging_dir='/workspace/checkpoint/gui_exp/sft_amex/v0-20260413_084132/runs', logging_strategy=, logging_first_step=True, logging_steps=1, logging_nan_inf_filter=True, save_strategy=, save_steps=618, save_total_limit=2, save_safetensors=True, save_on_each_node=False, save_only_model=True, restore_callback_states_from_checkpoint=False, no_cuda=False, use_cpu=False, use_mps_device=False, seed=42, data_seed=42, jit_mode_eval=False, use_ipex=False, bf16=True, fp16=False, fp16_opt_level='O1', half_precision_backend='auto', bf16_full_eval=False, fp16_full_eval=False, tf32=None, local_rank=0, ddp_backend=None, tpu_num_cores=None, tpu_metrics_debug=False, debug=[], dataloader_drop_last=False, eval_steps=500, dataloader_num_workers=4, dataloader_prefetch_factor=10, past_index=-1, run_name='/workspace/checkpoint/gui_exp/sft_amex/v0-20260413_084132', disable_tqdm=False, remove_unused_columns=False, label_names=None, load_best_model_at_end=False, metric_for_best_model='loss', greater_is_better=False, ignore_data_skip=False, fsdp=[], fsdp_min_num_params=0, fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_transformer_layer_cls_to_wrap=None, accelerator_config=AcceleratorConfig(split_batches=False, dispatch_batches=False, even_batches=True, use_seedable_sampler=True, non_blocking=False, gradient_accumulation_kwargs=None, use_configured_state=False), deepspeed={'fp16': {'enabled': 'auto', 'loss_scale': 0, 'loss_scale_window': 1000, 'initial_scale_power': 16, 'hysteresis': 2, 'min_loss_scale': 1}, 'bf16': {'enabled': 'auto'}, 'zero_optimization': {'stage': 2, 'offload_optimizer': {'device': 'none', 'pin_memory': True}, 'allgather_partitions': True, 'allgather_bucket_size': 200000000.0, 'overlap_comm': False, 'reduce_scatter': True, 'reduce_bucket_size': 200000000.0, 'contiguous_gradients': True}, 'gradient_accumulation_steps': 'auto', 'gradient_clipping': 'auto', 'steps_per_print': 2000, 'train_batch_size': 'auto', 'train_micro_batch_size_per_gpu': 'auto', 'wall_clock_breakdown': False}, label_smoothing_factor=0.0, optim=, optim_args=None, adafactor=False, group_by_length=False, length_column_name='length', report_to=['wandb'], ddp_find_unused_parameters=None, ddp_bucket_cap_mb=None, ddp_broadcast_buffers=None, dataloader_pin_memory=True, dataloader_persistent_workers=False, skip_memory_metrics=True, use_legacy_prediction_loop=False, push_to_hub=False, resume_from_checkpoint=None, hub_model_id=None, hub_strategy=, hub_token=None, hub_private_repo=None, hub_always_push=False, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, include_inputs_for_metrics=False, include_for_metrics=[], eval_do_concat_batches=True, fp16_backend='auto', push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=None, mp_parameters='', auto_find_batch_size=False, full_determinism=False, torchdynamo=None, ray_scope='last', ddp_timeout=1800, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, include_tokens_per_second=None, include_num_input_tokens_seen=None, neftune_noise_alpha=None, optim_target_modules=None, batch_eval_metrics=False, eval_on_start=False, use_liger_kernel=False, eval_use_gather_object=False, average_tokens_across_devices=None, sortish_sampler=False, predict_with_generate=False, generation_max_length=None, generation_num_beams=None, generation_config=None, check_model=True, acc_strategy='token', train_dataloader_shuffle=True, max_epochs=None, aligner_lr=None, vit_lr=None, optimizer=None, metric_warmup_step=0, fsdp_num=1, acc_steps=1, eval_use_evalscope=False, eval_datasets=[], eval_limit=None, eval_datasets_args=None, eval_generation_config=None, train_type='full', local_repo_path=None, galore_config=None)" }