Instructions to use v4xsh/nervousystem-sre-agent-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use v4xsh/nervousystem-sre-agent-lora with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("unsloth/Qwen2.5-7B-Instruct-bnb-4bit") model = PeftModel.from_pretrained(base_model, "v4xsh/nervousystem-sre-agent-lora") - Notebooks
- Google Colab
- Kaggle
nervousystem-sre-agent-lora
LoRA adapter for unsloth/Qwen2.5-7B-Instruct-bnb-4bit, fine-tuned to act as a Site Reliability Engineer agent for distributed GPU training fleets inside the NervousSystem-Env OpenEnv environment.
TL;DR
- Base model:
unsloth/Qwen2.5-7B-Instruct-bnb-4bit - Adapter: LoRA via PEFT (
r=16,alpha=16, targetq/k/v/o/gate/up/down_proj) - Training method: Supervised fine-tuning (SFT) using Hugging Face TRL
- Training data: 800 multi-step SRE rollout trajectories generated by the NervousSystem-Env OpenEnv environment under deterministic seeds
- Logged training steps: 40 (per
trainer.state.log_history) - Hardware: NVIDIA A10G on Hugging Face Jobs
- Loss: 2.53 → ~0.10 (real per-step values published at
results/sft_warmup_metrics.jsonin the env repo) - Final evaluation: 0.915 mean score, 100% pass rate over 12 phase-aware constrained episodes (
easy,medium,hard,cascade× 3 seeds each)
This adapter is the SFT warmup policy described in the NervousSystem-Env submission. The same training repository also includes an optional GRPO continuation pipeline that loops back to environment rewards, but the published adapter weights here are the SFT result.
Intended Use
This adapter is built to take partial cluster telemetry from the NervousSystem-Env OpenEnv environment and emit a single valid JSON SRE remediation action per step, for example:
{"action_type": "inspect_flight_recorder", "parameters": {"rank_id": 3}}
{"action_type": "topo_reorder", "parameters": {"affinity": "rack"}}
{"action_type": "patch_divergent_code", "parameters": {"file": "model/transformer.py", "fix_type": "synchronize_conditional"}}
It is intended for research/educational use inside the linked environment. It is not a general-purpose chat assistant and should not be used as one.
Out-of-Scope Use
- Production cluster operations or any safety-critical environment.
- General conversation, code generation, or unrelated tool use.
- Any setting where wrong actions could destabilize real hardware. The training distribution is a simulator.
Training Data
The training data consists of 800 oracle-style multi-step trajectories generated by NervousSystem-Env across the easy, medium, hard, and cascade tasks under deterministic seeds. Trajectories include the partial cluster observation, the chosen JSON action, and the resulting environment transitions. No external/private data is used.
Training Procedure
- Framework: Hugging Face TRL (
0.18.2) + PEFT (0.18.0) + bitsandbytes - Optimizer: AdamW
- Precision: 4-bit base + LoRA adapter
- Sequence length: 2048
- Logged steps: 40 (from
trainer.state.log_history, published asresults/sft_warmup_metrics.json) - Hardware: NVIDIA A10G (Hugging Face Jobs)
The full training script is published in the environment repository under training/grpo_train.py. The exact Hugging Face Jobs invocation used to produce this adapter is documented in the NervousSystem-Env README.
Evaluation
Evaluation uses the phase-aware constrained action scoring evaluator in scripts/evaluate_model.py from the environment repo. For each step, the model ranks valid next-step JSON actions for the current task phase, and the environment executes the highest-likelihood action.
| Metric | Value |
|---|---|
| Mean score | 0.9146806281246708 |
| Pass rate | 100% |
| Episodes | 12 (easy, medium, hard, cascade × 3 seeds) |
| Raw base model under same constrained scoring | 0.239 / 0% pass |
Per-task scores:
| Task | Scores by seed |
|---|---|
| easy | 0.99 / 0.99 / 0.99 |
| medium | 0.99 / 0.99 / 0.99 |
| hard | 0.85 / 0.85 / 0.99 |
| cascade | 0.782 / 0.782 / 0.782 |
Full eval JSON: results/final_phaseaware_model_eval.json in the environment repo.
How to Use
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base_id = "unsloth/Qwen2.5-7B-Instruct-bnb-4bit"
adapter_id = "v4xsh/nervousystem-sre-agent-lora"
tokenizer = AutoTokenizer.from_pretrained(base_id)
model = AutoModelForCausalLM.from_pretrained(base_id, device_map="auto")
model = PeftModel.from_pretrained(model, adapter_id)
model.eval()
Then send NervousSystem-Env observations as a chat-formatted prompt and parse the JSON action from the model's reply. The full eval script is scripts/evaluate_model.py in the environment repo.
Limitations and Honest Disclosure
- This adapter is an SFT warmup policy, not a fully optimized online RL policy. The GRPO continuation loop in
training/grpo_train.pyworks end-to-end on environment reward but is not the source of the published weights here. - The reported 0.915 score uses phase-aware constrained action scoring, which restricts candidate actions to the current task phase. Free-form generation numbers would be lower; this is documented in the environment repo's README and
Blog.md. - The simulator is deterministic under seed and models production-inspired failure signatures, not a real GPU cluster.
Links
- Environment Space: https://huggingface.co/spaces/v4xsh/nervousystem-env
- Final HF Jobs training log: https://huggingface.co/jobs/v4xsh/69eda4d1d2c8bd8662bcf435
- Reproduction Colab: https://colab.research.google.com/drive/1twXiRHoAxchy9UUgn7S15a2ac3GrtAlW?usp=sharing
- Blog post: https://huggingface.co/spaces/v4xsh/nervousystem-env/blob/main/Blog.md
License
Apache 2.0 for the adapter weights. The base model unsloth/Qwen2.5-7B-Instruct-bnb-4bit is governed by its own license; please consult the upstream model card before redistribution.
- Downloads last month
- 51
Model tree for v4xsh/nervousystem-sre-agent-lora
Base model
Qwen/Qwen2.5-7B