NegotiateAI Procurement Agent
Fine-tuned Llama 3.2 3B using GRPO (Group Relative Policy Optimisation) on the NegotiateAI adversarial procurement environment.
Model Description
This model was trained to act as a procurement manager inside the NegotiateAI OpenEnv environment. It negotiates contracts with supplier AIs, manages budgets and deadlines, and makes strategic decisions across a 12-week fiscal cycle.
- Base model: unsloth/Llama-3.2-3B-Instruct
- Training method: GRPO via HuggingFace TRL
- Hardware: NVIDIA A100 80GB
- Fine-tuning type: LoRA adapters
Training Environment
| Metric | Value |
|---|---|
| Training episodes | 200 |
| Training samples | 1,333 |
| Tier advancements | Novice β Apprentice β Practitioner β Expert |
| Expert tier episodes | 43% |
| First 20 steps avg reward | 0.0101 |
Links
| Resource | URL |
|---|---|
| π€ HuggingFace Space | https://huggingface.co/spaces/prasanthdj8/negotiateai-openenv |
| π Training Notebook | https://huggingface.co/spaces/prasanthdj8/negotiateai-openenv/blob/main/NegotiateAI_Training.ipynb |
| π Blog | https://huggingface.co/spaces/prasanthdj8/negotiateai-openenv/blob/main/Blog.md |
| π» GitHub | https://github.com/Prasanthdj8/negotiateai-openenv |
Usage
This model is designed to work with the NegotiateAI OpenEnv environment. Load via the training notebook linked above.
Model tree for prasanthdj8/negotiateai-procurement-agent
Base model
meta-llama/Llama-3.2-3B-Instruct Finetuned
unsloth/Llama-3.2-3B-Instruct