NegotiateAI Procurement Agent

Fine-tuned Llama 3.2 3B using GRPO (Group Relative Policy Optimisation) on the NegotiateAI adversarial procurement environment.

Model Description

This model was trained to act as a procurement manager inside the NegotiateAI OpenEnv environment. It negotiates contracts with supplier AIs, manages budgets and deadlines, and makes strategic decisions across a 12-week fiscal cycle.

  • Base model: unsloth/Llama-3.2-3B-Instruct
  • Training method: GRPO via HuggingFace TRL
  • Hardware: NVIDIA A100 80GB
  • Fine-tuning type: LoRA adapters

Training Environment

Metric Value
Training episodes 200
Training samples 1,333
Tier advancements Novice β†’ Apprentice β†’ Practitioner β†’ Expert
Expert tier episodes 43%
First 20 steps avg reward 0.0101

Links

Usage

This model is designed to work with the NegotiateAI OpenEnv environment. Load via the training notebook linked above.

Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading

Model tree for prasanthdj8/negotiateai-procurement-agent

Finetuned
(586)
this model