Social Agent Negotiation โ€” GRPO LoRA Adapter

Fine-tuned Llama-3.2-1B-Instruct via GRPO on the social-agent-negotiation-v1 OpenEnv environment.

Training: 3 epochs ร— 8 episodes ร— 3 tasks (single-round-consensus, adversarial-information, opioid-overdose)
Method: Group Relative Policy Optimization (GRPO) via HuggingFace TRL
Base model: unsloth/Llama-3.2-1B-Instruct (4-bit LoRA, r=16, alpha=32)

See the training notebook: training/grpo_training.ipynb in the GitHub repo.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Bharath-1608/negotiation-agent-grpo

Finetuned
(455)
this model

Space using Bharath-1608/negotiation-agent-grpo 1