DevOps Incident Responder

A fine-tuned Mistral-NeMo-Minitron-8B-Instruct model for DevOps incident diagnosis and resolution.

What It Does

Analyzes error logs, stack traces, and incident descriptions to provide:

  • Root Cause analysis
  • Severity assessment (Low / Medium / High / Critical)
  • Step-by-step fixes with exact commands
  • Prevention guidance

Tech Coverage

Kubernetes, Docker, Terraform, Azure, GCP, Node.js, Redis, MongoDB, Nginx, PostgreSQL, InfluxDB

Training Details

Parameter Value
Base Model nvidia/Mistral-NeMo-Minitron-8B-Instruct
Method QLoRA (4-bit quantization + LoRA adapters)
Dataset 4,755 examples (scraped + synthetic)
Eval Set 376 examples
Epochs 2
LoRA Rank 32
LoRA Alpha 64
Learning Rate 2e-4
Effective Batch Size 16

Usage

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

model_id = "irfanalee/incident-responder"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    quantization_config=BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_compute_dtype=torch.bfloat16,
    ),
    device_map="auto",
)

messages = [
    {"role": "system", "content": "You are an expert DevOps engineer and SRE. Analyze error logs, diagnose incidents, and suggest fixes."},
    {"role": "user", "content": "Analyze this kubernetes incident:\n\n```\nkubectl describe pod api-server\nState: Terminated\nReason: OOMKilled\nExit Code: 137\nRestart Count: 5\n```"}
]

# NeMo chat template
prompt = "<extra_id_0>System\n" + messages[0]["content"] + "\n"
prompt += "<extra_id_1>User\n" + messages[1]["content"] + "\n<extra_id_1>Assistant\n"

inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(
    **inputs,
    max_new_tokens=200,
    temperature=0.4,
    repetition_penalty=1.3,
    do_sample=True,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Downloads last month
-
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for irfanalee/incident-responder