dh_ministal_gpro
LoRA adapter for mistralai/Ministral-3-8B-Instruct-2512-BF16, fine-tuned with GRPO to play Duck Hunt.
Training
- Method: Group Relative Policy Optimization (GRPO)
- LoRA rank: 16, alpha: 32
- Target modules: q_proj, k_proj, v_proj, o_proj
- Learning rate: 5e-06
- Generations per prompt: 4
- Best eval hit rate: -100.0%
Usage
The adapter produces Mistral native tool calls ([TOOL_CALLS]), compatible with
OpenAI SDK via vLLM or TGI.
from peft import AutoPeftModelForCausalLM
from transformers import AutoProcessor
model = AutoPeftModelForCausalLM.from_pretrained("dmayboroda/dh_ministal_gpro")
processor = AutoProcessor.from_pretrained("dmayboroda/dh_ministal_gpro")
- Downloads last month
- 41
Model tree for dmayboroda/dh_ministal_gpro
Base model
mistralai/Ministral-3-8B-Base-2512