---
base_model: unsloth/llama-3-8b-Instruct
library_name: peft
pipeline_tag: text-generation
tags:
- base_model:adapter:unsloth/llama-3-8b-Instruct
- grpo
- lora
- transformers
- trl
- unsloth
---

# CLI Agent — Llama 3 8B GRPO Fine-tune (GPU 1 / lr=5e-6)

A LoRA adapter fine-tuned on Meta-Llama-3-8B-Instruct using GRPO (Group Relative Policy Optimization) to generate correct Linux shell commands from natural language task descriptions. This is the GPU 1 run trained at lr=5e-6. See also [jalva182/cli-agent-model](https://huggingface.co/jalva182/cli-agent-model) for the GPU 0 run at lr=3e-6.

## Model Details

### Model Description

- **Developed by:** Jose Alvarez, Carson Chiem, Prisha Bhattacharyya, Vishal Tyagi
- **Model type:** Causal Language Model (LoRA adapter)
- **Language(s) (NLP):** English
- **License:** Meta Llama 3 Community License
- **Finetuned from model:** unsloth/llama-3-8b-Instruct

### Model Sources

- **Repository:** https://github.com/Alvarez-Jose/unsloth-grpo-project

## Uses

### Direct Use

Given a natural language description of a CLI task, the model outputs the correct shell command with no explanation, no markdown, and no backticks.

Example:
- Input: "Count the number of lines in /tmp/data/log.txt"
- Output: `wc -l /tmp/data/log.txt`

### Out-of-Scope Use

- Not intended for general conversation
- Not suitable for tasks outside Linux CLI command generation
- Should not be used for destructive or malicious shell commands

## Bias, Risks, and Limitations

- Model may generate incorrect or harmful shell commands — always review before executing
- Trained on a limited set of ~60 task types, may not generalize to all CLI scenarios
- Performance degrades on complex multi-step tasks

## How to Get Started with the Model
```python
from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="jalva182/cli-agent-model-gpu1",
    max_seq_length=512,
    load_in_4bit=True,
)

messages = [
    {"role": "system", "content": "You are a CLI expert. Given a task, output exactly the shell commands required. No explanation, no markdown, no backticks."},
    {"role": "user", "content": "Count the number of lines in /tmp/data/log.txt"},
]

inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda")
outputs = model.generate(input_ids=inputs, max_new_tokens=64)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

## Training Details

### Training Data

60 validated CLI tasks covering file operations, text processing (grep, awk, sed), sorting, archives, system info, permissions, and environment variables. Each task includes setup commands, expected output, and a reward function for GRPO training.

### Training Hyperparameters

- **Training regime:** bf16 mixed precision
- **Method:** GRPO (Group Relative Policy Optimization)
- **Learning rate:** 5e-6 with linear scheduler
- **Warmup ratio:** 0.1
- **Batch size:** 2 (per device)
- **Gradient accumulation steps:** 2
- **Total steps:** 10000
- **LoRA rank:** 32, alpha: 64
- **KL coefficient:** 0.05
- **Number of generations:** 4
- **Max sequence length:** 512

### Speeds, Sizes, Times

- **Training time:** ~4h 7min
- **Checkpoint size:** ~524MB (LoRA adapter only)
- **Final train loss:** 0.0188
- **Final reward:** 8.0/8.0 on final steps

## Evaluation

### Metrics

Reward function scoring 0-8 per task:
- +5 for correct output match
- +3 for command success with partial match
- -2 for command failure or wrong output

### Results

- **Best reward:** 8.0
- **Average reward (final steps):** ~6.0
- **Train loss:** 0.0188

## Comparison with GPU 0 Run

| | GPU 0 (cli-agent-model) | GPU 1 (cli-agent-model-gpu1) |
|---|---|---|
| Learning rate | 3e-6 | 5e-6 |
| Train loss | 0.0141 | 0.0188 |
| Final reward | 8.0 | 8.0 |
| Runtime | 3h 13min | 4h 7min |
| Recommendation | ✅ Primary | Secondary |

GPU 0 achieved lower train loss and is recommended as the primary model.

## Environmental Impact

- **Hardware Type:** H100 SXM 80GB
- **Hours used:** ~4.5 hours
- **Cloud Provider:** Vast.ai

## Technical Specifications

### Model Architecture

- Base: Meta-Llama-3-8B-Instruct
- Adapter: LoRA (rank=32, alpha=64, dropout=0.05)
- Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj

### Software

- unsloth 2026.3.3
- trl 0.24.0
- transformers 4.56.1
- torch 2.6.0+cu124
- PEFT 0.18.1

## Model Card Authors

Jose Alvarez

## Model Card Contact

https://github.com/Alvarez-Jose/unsloth-grpo-project

### Framework versions

- PEFT 0.18.1