Social Agent Negotiation — GRPO LoRA Adapter

Fine-tuned Llama-3.2-1B-Instruct via GRPO on the social-agent-negotiation-v1 OpenEnv environment.

Training: 3 epochs × 8 episodes × 3 tasks (single-round-consensus, adversarial-information, opioid-overdose)
Method: Group Relative Policy Optimization (GRPO) via HuggingFace TRL
Base model: unsloth/Llama-3.2-1B-Instruct (4-bit LoRA, r=16, alpha=32)

See the training notebook: training/grpo_training.ipynb in the GitHub repo.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Bharath-1608/negotiation-agent-grpo

Base model

meta-llama/Llama-3.2-1B-Instruct

Finetuned

unsloth/Llama-3.2-1B-Instruct

Finetuned

(455)

this model

Bharath-1608
/

negotiation-agent-grpo

Social Agent Negotiation — GRPO LoRA Adapter

Model tree for Bharath-1608/negotiation-agent-grpo

Space using Bharath-1608/negotiation-agent-grpo 1