File size: 2,576 Bytes
d7f0e97
 
 
 
 
be18be4
d7f0e97
 
 
 
 
 
 
 
acc7664
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
---
license: apache-2.0
base_model:
- Qwen/Qwen2.5-VL-32B-Instruct
---
# TongUI: Internet-Scale Trajectories from Multimodal Web Tutorials for Generalized GUI Agents

Model trained from [GUI-Net Dataset](https://huggingface.co/datasets/Bofeee5675/GUI-Net-1M)

See detail at our [Project Page](https://github.com/TongUI-agent/TongUI-agent)


## Model Details

The base model is `Qwen/Qwen2.5-VL-32B-Instruct`. We fine-tuned base model by Lora.

**Note:** Due to large size of 32B model, we only release the LoRA part of this model. To merge the weights, use the following script:

```python
from transformers import AutoProcessor, Qwen2_5_VLForConditionalGeneration, AutoConfig, AutoModelForImageTextToText
import torch
from peft.peft_model import PeftModel

def load_model_and_processor(model_path, precision="bf16", lora_path=None, merge_lora=True):
    """
    Load the Qwen2.5-VL model and processor with optional LoRA weights.
    
    Args:
        args: Arguments containing:
            - model_path: Path to the base model
            - precision: Model precision ("fp16", "bf16", or "fp32")
            - lora_path: Path to LoRA weights (optional)
            - merge_lora: Boolean indicating whether to merge LoRA weights
            
    Returns:
        tuple: (processor, model) - The initialized processor and model
    """
    # Initialize processor
    try:
        processor = AutoProcessor.from_pretrained(
            model_path
        )
    except Exception as e:
        print(f"Error loading processor: {e}")
        processor = None
        config = AutoConfig.from_pretrained(model_path)
        print(config)
        raise e
    # Initialize base model
    from transformers import Qwen2_5_VLForConditionalGeneration
    # Initialize base model
    model_cls = Qwen2_5_VLForConditionalGeneration
    model = model_cls.from_pretrained(
        model_path,
        device_map="auto",
        torch_dtype=torch.float16 if precision == "fp16" else torch.bfloat16 if precision == "bf16" else torch.float32,
        attn_implementation="flash_attention_2",
    )
    
    # Load LoRA weights if path is provided
    if lora_path is not None and len(lora_path) > 0:
        print(f"Loading LoRA weights from {lora_path}")
        model = PeftModel.from_pretrained(model, lora_path)
        
        if merge_lora:
            print("Merging LoRA weights into base model")
            model = model.merge_and_unload()
    
    model.eval()
    
    return processor, model
```

`model_path` is the base model, and `lora_path` is where you download this repo.