aria-devops-llama8b / README.md
Arijit-07's picture
Update README.md
e774b09 verified
---
base_model: unsloth/Meta-Llama-3.1-8B-Instruct
library_name: peft
pipeline_tag: text-generation
tags:
- grpo
- reinforcement-learning
- devops
- incident-response
- openenv
- unsloth
---
# ARIA — DevOps Incident Response Agent
### Llama-3.1-8B fine-tuned with GRPO
Trained on the [ARIA DevOps Incident Response](https://huggingface.co/spaces/Arijit-07/devops-incident-response)
live RL environment using GRPO.
## Training Results
| Task | Baseline | Fine-tuned | Improvement |
|---|---|---|---|
| easy | 0.320 | 0.685 | **+0.365** |
| medium | 0.050 | 0.378 | **+0.328** |
| hard | 0.190 | 0.869 | **+0.679** |
| bonus | 0.152 | 0.682 | **+0.530** |
![Training Curve](training_curve_8b.png)
## Setup
- Algorithm: GRPO
- Base: Llama-3.1-8B-Instruct
- LoRA rank: 32, alpha: 64
- Episodes: 160 (40 per task)
- GPU: NVIDIA L4, 162 minutes
- Framework: Unsloth + HuggingFace TRL
## Links
- Environment: https://huggingface.co/spaces/Arijit-07/devops-incident-response
- GitHub: https://github.com/Twilight-13/devops-incident-response