--- base_model: unsloth/Meta-Llama-3.1-8B-Instruct library_name: peft pipeline_tag: text-generation tags: - grpo - reinforcement-learning - devops - incident-response - openenv - unsloth --- # ARIA — DevOps Incident Response Agent ### Llama-3.1-8B fine-tuned with GRPO Trained on the [ARIA DevOps Incident Response](https://huggingface.co/spaces/Arijit-07/devops-incident-response) live RL environment using GRPO. ## Training Results | Task | Baseline | Fine-tuned | Improvement | |---|---|---|---| | easy | 0.320 | 0.685 | **+0.365** | | medium | 0.050 | 0.378 | **+0.328** | | hard | 0.190 | 0.869 | **+0.679** | | bonus | 0.152 | 0.682 | **+0.530** | ![Training Curve](training_curve_8b.png) ## Setup - Algorithm: GRPO - Base: Llama-3.1-8B-Instruct - LoRA rank: 32, alpha: 64 - Episodes: 160 (40 per task) - GPU: NVIDIA L4, 162 minutes - Framework: Unsloth + HuggingFace TRL ## Links - Environment: https://huggingface.co/spaces/Arijit-07/devops-incident-response - GitHub: https://github.com/Twilight-13/devops-incident-response