---
base_model: unsloth/Meta-Llama-3.1-8B-Instruct
library_name: peft
pipeline_tag: text-generation
tags:
- grpo
- reinforcement-learning
- devops
- incident-response
- openenv
- unsloth
---

# ARIA — DevOps Incident Response Agent
### Llama-3.1-8B fine-tuned with GRPO

Trained on the [ARIA DevOps Incident Response](https://huggingface.co/spaces/Arijit-07/devops-incident-response) 
live RL environment using GRPO.

## Training Results

| Task | Baseline | Fine-tuned | Improvement |
|---|---|---|---|
| easy | 0.320 | 0.685 | **+0.365** |
| medium | 0.050 | 0.378 | **+0.328** |
| hard | 0.190 | 0.869 | **+0.679** |
| bonus | 0.152 | 0.682 | **+0.530** |

![Training Curve](training_curve_8b.png)

## Setup
- Algorithm: GRPO
- Base: Llama-3.1-8B-Instruct  
- LoRA rank: 32, alpha: 64
- Episodes: 160 (40 per task)
- GPU: NVIDIA L4, 162 minutes
- Framework: Unsloth + HuggingFace TRL

## Links
- Environment: https://huggingface.co/spaces/Arijit-07/devops-incident-response
- GitHub: https://github.com/Twilight-13/devops-incident-response