qwen3_vl_8b_grpo_agent

Model Description

Qwen3-VL-8B fine-tuned with GRPO for grid-based component localization using agent tools

Base Model: unsloth/Qwen3-VL-8B-Instruct-unsloth-bnb-4bit Training Method: GRPO (Group Relative Policy Optimization) Task: Grid-based component localization with tool use

Training Details

Training Framework

Method: GRPO with Unsloth
LoRA Configuration:
- Rank (r): 16
- Alpha: 16
- Dropout: 0
- Target Modules: Attention and MLP layers

Training Data

Dataset: SLD component detection dataset
Format: Component bounding boxes with metadata
Components: Electrical panels (TSS, PSU-P, PP-series) with voltage and ampere ratings

Training Parameters

Group Size: 8 trajectories per example
Batch Size: 2
Learning Rate: 5e-6
Temperature: 0.7
Reward Functions:
- IoU with ground truth
- Efficiency (fewer steps)
- Centering (component in crop center)

Usage

from transformers import AutoModelForCausalLM, AutoProcessor
import torch

# Load model
model = AutoModelForCausalLM.from_pretrained(
    "qwen3_vl_8b_grpo_agent",
    device_map="auto",
    torch_dtype=torch.float16,
    trust_remote_code=True
)

processor = AutoProcessor.from_pretrained("qwen3_vl_8b_grpo_agent", trust_remote_code=True)

# Prepare inputs
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": "path/to/sld_diagram.png"},
            {"type": "text", "text": "Locate the component TSS in this diagram"}
        ]
    }
]

text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=text, images=image, return_tensors="pt").to(model.device)

# Generate
outputs = model.generate(**inputs, max_new_tokens=512)
result = processor.decode(outputs[0], skip_special_tokens=True)
print(result)

Model Performance

This model was trained using GRPO to optimize for:

Accurate bounding box prediction (IoU score)
Efficient component search (minimal steps)
Centered component detection (component in crop center)

Limitations

Trained specifically on electrical SLD diagrams
Best performance on components similar to training data
Requires high-resolution input images for accurate detection

Citation

@misc{qwen3_vl_grpo_sld,
  title = {qwen3_vl_8b_grpo_agent},
  author = {SLD Training Team},
  year = {2024},
  note = {GRPO-trained Qwen3-VL-8B for SLD component detection}
}

License

Apache 2.0

Downloads last month: 3

Safetensors

Model size

9B params

Tensor type

F32

F16

Model tree for pavan01729/qwen3_vl_8b_grpo_agent

Base model

Qwen/Qwen3-VL-8B-Instruct

Quantized

unsloth/Qwen3-VL-8B-Instruct-unsloth-bnb-4bit

Quantized

(19)

this model