Social Agent Negotiation โ GRPO LoRA Adapter
Fine-tuned Llama-3.2-1B-Instruct via GRPO on the
social-agent-negotiation-v1
OpenEnv environment.
Training: 3 epochs ร 8 episodes ร 3 tasks (single-round-consensus, adversarial-information, opioid-overdose)
Method: Group Relative Policy Optimization (GRPO) via HuggingFace TRL
Base model: unsloth/Llama-3.2-1B-Instruct (4-bit LoRA, r=16, alpha=32)
See the training notebook: training/grpo_training.ipynb in the
GitHub repo.
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support
Model tree for Bharath-1608/negotiation-agent-grpo
Base model
meta-llama/Llama-3.2-1B-Instruct Finetuned
unsloth/Llama-3.2-1B-Instruct