qwen3_vl_8b_grpo_agent

Model Description

Qwen3-VL-8B fine-tuned with GRPO for grid-based component localization using agent tools

Base Model: unsloth/Qwen3-VL-8B-Instruct-unsloth-bnb-4bit Training Method: GRPO (Group Relative Policy Optimization) Task: Grid-based component localization with tool use

Training Details

Training Framework

  • Method: GRPO with Unsloth
  • LoRA Configuration:
    • Rank (r): 16
    • Alpha: 16
    • Dropout: 0
    • Target Modules: Attention and MLP layers

Training Data

  • Dataset: SLD component detection dataset
  • Format: Component bounding boxes with metadata
  • Components: Electrical panels (TSS, PSU-P, PP-series) with voltage and ampere ratings

Training Parameters

  • Group Size: 8 trajectories per example
  • Batch Size: 2
  • Learning Rate: 5e-6
  • Temperature: 0.7
  • Reward Functions:
    • IoU with ground truth
    • Efficiency (fewer steps)
    • Centering (component in crop center)

Usage

from transformers import AutoModelForCausalLM, AutoProcessor
import torch

# Load model
model = AutoModelForCausalLM.from_pretrained(
    "qwen3_vl_8b_grpo_agent",
    device_map="auto",
    torch_dtype=torch.float16,
    trust_remote_code=True
)

processor = AutoProcessor.from_pretrained("qwen3_vl_8b_grpo_agent", trust_remote_code=True)

# Prepare inputs
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": "path/to/sld_diagram.png"},
            {"type": "text", "text": "Locate the component TSS in this diagram"}
        ]
    }
]

text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=text, images=image, return_tensors="pt").to(model.device)

# Generate
outputs = model.generate(**inputs, max_new_tokens=512)
result = processor.decode(outputs[0], skip_special_tokens=True)
print(result)

Model Performance

This model was trained using GRPO to optimize for:

  1. Accurate bounding box prediction (IoU score)
  2. Efficient component search (minimal steps)
  3. Centered component detection (component in crop center)

Limitations

  • Trained specifically on electrical SLD diagrams
  • Best performance on components similar to training data
  • Requires high-resolution input images for accurate detection

Citation

@misc{qwen3_vl_grpo_sld,
  title = {qwen3_vl_8b_grpo_agent},
  author = {SLD Training Team},
  year = {2024},
  note = {GRPO-trained Qwen3-VL-8B for SLD component detection}
}

License

Apache 2.0

Downloads last month
3
Safetensors
Model size
9B params
Tensor type
F32
F16
U8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for pavan01729/qwen3_vl_8b_grpo_agent