qwen3_vl_8b_grpo_agent
Model Description
Qwen3-VL-8B fine-tuned with GRPO for grid-based component localization using agent tools
Base Model: unsloth/Qwen3-VL-8B-Instruct-unsloth-bnb-4bit
Training Method: GRPO (Group Relative Policy Optimization)
Task: Grid-based component localization with tool use
Training Details
Training Framework
- Method: GRPO with Unsloth
- LoRA Configuration:
- Rank (r): 16
- Alpha: 16
- Dropout: 0
- Target Modules: Attention and MLP layers
Training Data
- Dataset: SLD component detection dataset
- Format: Component bounding boxes with metadata
- Components: Electrical panels (TSS, PSU-P, PP-series) with voltage and ampere ratings
Training Parameters
- Group Size: 8 trajectories per example
- Batch Size: 2
- Learning Rate: 5e-6
- Temperature: 0.7
- Reward Functions:
- IoU with ground truth
- Efficiency (fewer steps)
- Centering (component in crop center)
Usage
from transformers import AutoModelForCausalLM, AutoProcessor
import torch
# Load model
model = AutoModelForCausalLM.from_pretrained(
"qwen3_vl_8b_grpo_agent",
device_map="auto",
torch_dtype=torch.float16,
trust_remote_code=True
)
processor = AutoProcessor.from_pretrained("qwen3_vl_8b_grpo_agent", trust_remote_code=True)
# Prepare inputs
messages = [
{
"role": "user",
"content": [
{"type": "image", "image": "path/to/sld_diagram.png"},
{"type": "text", "text": "Locate the component TSS in this diagram"}
]
}
]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=text, images=image, return_tensors="pt").to(model.device)
# Generate
outputs = model.generate(**inputs, max_new_tokens=512)
result = processor.decode(outputs[0], skip_special_tokens=True)
print(result)
Model Performance
This model was trained using GRPO to optimize for:
- Accurate bounding box prediction (IoU score)
- Efficient component search (minimal steps)
- Centered component detection (component in crop center)
Limitations
- Trained specifically on electrical SLD diagrams
- Best performance on components similar to training data
- Requires high-resolution input images for accurate detection
Citation
@misc{qwen3_vl_grpo_sld,
title = {qwen3_vl_8b_grpo_agent},
author = {SLD Training Team},
year = {2024},
note = {GRPO-trained Qwen3-VL-8B for SLD component detection}
}
License
Apache 2.0
- Downloads last month
- 3
Model tree for pavan01729/qwen3_vl_8b_grpo_agent
Base model
Qwen/Qwen3-VL-8B-Instruct