ARIA β DevOps Incident Response Agent
Llama-3.1-8B fine-tuned with GRPO
Trained on the ARIA DevOps Incident Response live RL environment using GRPO.
Training Results
| Task | Baseline | Fine-tuned | Improvement |
|---|---|---|---|
| easy | 0.320 | 0.685 | +0.365 |
| medium | 0.050 | 0.378 | +0.328 |
| hard | 0.190 | 0.869 | +0.679 |
| bonus | 0.152 | 0.682 | +0.530 |
Setup
- Algorithm: GRPO
- Base: Llama-3.1-8B-Instruct
- LoRA rank: 32, alpha: 64
- Episodes: 160 (40 per task)
- GPU: NVIDIA L4, 162 minutes
- Framework: Unsloth + HuggingFace TRL
Links
- Downloads last month
- -
Model tree for Arijit-07/aria-devops-llama8b
Base model
meta-llama/Llama-3.1-8B Finetuned
meta-llama/Llama-3.1-8B-Instruct Finetuned
unsloth/Meta-Llama-3.1-8B-Instruct