| | --- |
| | license: apache-2.0 |
| | base_model: |
| | - Qwen/Qwen2.5-VL-32B-Instruct |
| | --- |
| | # TongUI: Internet-Scale Trajectories from Multimodal Web Tutorials for Generalized GUI Agents |
| |
|
| | Model trained from [GUI-Net Dataset](https://huggingface.co/datasets/Bofeee5675/GUI-Net-1M) |
| |
|
| | See detail at our [Project Page](https://github.com/TongUI-agent/TongUI-agent) |
| |
|
| |
|
| | ## Model Details |
| |
|
| | The base model is `Qwen/Qwen2.5-VL-32B-Instruct`. We fine-tuned base model by Lora. |
| |
|
| | **Note:** Due to large size of 32B model, we only release the LoRA part of this model. To merge the weights, use the following script: |
| |
|
| | ```python |
| | from transformers import AutoProcessor, Qwen2_5_VLForConditionalGeneration, AutoConfig, AutoModelForImageTextToText |
| | import torch |
| | from peft.peft_model import PeftModel |
| | |
| | def load_model_and_processor(model_path, precision="bf16", lora_path=None, merge_lora=True): |
| | """ |
| | Load the Qwen2.5-VL model and processor with optional LoRA weights. |
| | |
| | Args: |
| | args: Arguments containing: |
| | - model_path: Path to the base model |
| | - precision: Model precision ("fp16", "bf16", or "fp32") |
| | - lora_path: Path to LoRA weights (optional) |
| | - merge_lora: Boolean indicating whether to merge LoRA weights |
| | |
| | Returns: |
| | tuple: (processor, model) - The initialized processor and model |
| | """ |
| | # Initialize processor |
| | try: |
| | processor = AutoProcessor.from_pretrained( |
| | model_path |
| | ) |
| | except Exception as e: |
| | print(f"Error loading processor: {e}") |
| | processor = None |
| | config = AutoConfig.from_pretrained(model_path) |
| | print(config) |
| | raise e |
| | # Initialize base model |
| | from transformers import Qwen2_5_VLForConditionalGeneration |
| | # Initialize base model |
| | model_cls = Qwen2_5_VLForConditionalGeneration |
| | model = model_cls.from_pretrained( |
| | model_path, |
| | device_map="auto", |
| | torch_dtype=torch.float16 if precision == "fp16" else torch.bfloat16 if precision == "bf16" else torch.float32, |
| | attn_implementation="flash_attention_2", |
| | ) |
| | |
| | # Load LoRA weights if path is provided |
| | if lora_path is not None and len(lora_path) > 0: |
| | print(f"Loading LoRA weights from {lora_path}") |
| | model = PeftModel.from_pretrained(model, lora_path) |
| | |
| | if merge_lora: |
| | print("Merging LoRA weights into base model") |
| | model = model.merge_and_unload() |
| | |
| | model.eval() |
| | |
| | return processor, model |
| | ``` |
| |
|
| | `model_path` is the base model, and `lora_path` is where you download this repo. |