NegotiateAI Procurement Agent

Fine-tuned Llama 3.2 3B using GRPO (Group Relative Policy Optimisation) on the NegotiateAI adversarial procurement environment.

Model Description

This model was trained to act as a procurement manager inside the NegotiateAI OpenEnv environment. It negotiates contracts with supplier AIs, manages budgets and deadlines, and makes strategic decisions across a 12-week fiscal cycle.

Base model: unsloth/Llama-3.2-3B-Instruct
Training method: GRPO via HuggingFace TRL
Hardware: NVIDIA A100 80GB
Fine-tuning type: LoRA adapters

Training Environment

Metric	Value
Training episodes	200
Training samples	1,333
Tier advancements	Novice → Apprentice → Practitioner → Expert
Expert tier episodes	43%
First 20 steps avg reward	0.0101

Links

Resource	URL
🤗 HuggingFace Space	https://huggingface.co/spaces/prasanthdj8/negotiateai-openenv
📓 Training Notebook	https://huggingface.co/spaces/prasanthdj8/negotiateai-openenv/blob/main/NegotiateAI_Training.ipynb
📝 Blog	https://huggingface.co/spaces/prasanthdj8/negotiateai-openenv/blob/main/Blog.md
💻 GitHub	https://github.com/Prasanthdj8/negotiateai-openenv

Usage

This model is designed to work with the NegotiateAI OpenEnv environment. Load via the training notebook linked above.

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Reinforcement Learning

Model tree for prasanthdj8/negotiateai-procurement-agent

Base model

meta-llama/Llama-3.2-3B-Instruct

Finetuned

unsloth/Llama-3.2-3B-Instruct

Finetuned

(606)

this model