# 🚀 CommitGuard — Comprehensive GCP Deployment & Training Guide (A10G)

This document is a deep-dive, step-by-step manual for deploying the CommitGuard environment and training pipeline to a Google Cloud Platform (GCP) instance. We are targeting an **NVIDIA A10G GPU** to execute **GRPO (Group Relative Policy Optimization)** on the Llama-3.2-3B model.

---

## 📋 1. Prerequisites: Setting Up Your Toolbox
Before you touch the cloud, you must ensure your local environment and external accounts are configured. These are the building blocks of the entire run.

### A. GCP Account & Project Setup
*   **Active Project:** You must have a GCP project created. Note your `PROJECT_ID`.
*   **GPU Quota:** By default, GCP projects have 0 quota for GPUs. You must navigate to `IAM & Admin > Quotas` and request a limit increase for `NVIDIA_A10G_GPUS` in your desired region (e.g., `us-central1`). **Do this 24 hours in advance.**

### B. Weights & Biases (WandB) for Visualization
*   **Why?** RL training can be unstable. WandB allows you to monitor the "Reward" and "KL Divergence" curves in real-time from your browser.
*   **Action:** Create a free account at [wandb.ai](https://wandb.ai), navigate to your settings, and copy your **API Key**.

### C. Hugging Face Account & Llama Access
*   **Model Gating:** Llama-3.2-3B is a gated model. You must visit the [model page](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) and apply for access. Approval usually takes 30-60 minutes.
*   **Access Token:** Generate a "Write" token in your Hugging Face settings to allow the VM to download the model and upload your finished adapters.

### D. Local gcloud CLI Initialization
*   **Installation:** Install the Google Cloud SDK on your laptop.
*   **Authentication:** Run `gcloud auth login` and `gcloud config set project [YOUR_PROJECT_ID]`. This allows your local terminal to "talk" to GCP.

---

## 🛠 Step 1: Provisioning the High-Performance VM
We are using the **G2 Standard 4** machine. It is specifically designed for AI workloads.

### Detailed Breakdown of the Creation Command
*   **`--machine-type g2-standard-4`:** Provides 4 vCPUs and 16GB of system RAM, ensuring the CPU doesn't bottleneck the GPU.
*   **`--accelerator type=nvidia-a10g,count=1`:** Attaches the A10G GPU. Its 24GB of VRAM is the "Goldilocks" zone for 3B parameter models—enough to handle the model plus the multiple "generations" required by the GRPO algorithm.
*   **`--image-family common-cu121`:** Uses a specialized Google image that comes with **CUDA 12.1 and NVIDIA drivers pre-installed**. This saves you 30 minutes of manual driver installation.
*   **`--provisioning-model=SPOT`:** **CRITICAL FOR BUDGET.** Spot instances use excess capacity and are ~70% cheaper than standard instances. If the instance is reclaimed by Google, your 50-step checkpoints ensure you don't lose much progress.

```bash
gcloud compute instances create commitguard-trainer \
    --project=[PROJECT_ID] \
    --zone=us-central1-a \
    --machine-type=g2-standard-4 \
    --accelerator=count=1,type=nvidia-a10g \
    --image-project=ml-images \
    --image-family=common-cu121 \
    --boot-disk-size=100GB \
    --boot-disk-type=pd-balanced \
    --maintenance-policy=TERMINATE \
    --provisioning-model=SPOT
```

---

## 🏗 Step 2: Environment Preparation
Once the VM is "Running," we need to turn it into a specialized CommitGuard lab.

### A. Secure Connection (SSH)
Connect to the machine's terminal:
```bash
gcloud compute ssh commitguard-trainer --zone=us-central1-a
```

### B. Repository & Virtual Environment
We isolate our dependencies to prevent conflicts with system-level Python packages.
```bash
# Clone the project
git clone https://github.com/[YOUR_USER]/commitguard.git
cd commitguard

# Create a 'venv' (Virtual Environment)
python3 -m venv .venv
source .venv/bin/activate

# Authenticate with Hugging Face (Required for gated Llama models)
huggingface-cli login
```

### C. Installing the "Train" Stack
The `-e ".[train]"` command installs the `commitguard` package in "editable" mode along with all optional training libraries like `torch`, `peft`, and `trl`.
```bash
pip install -U pip
pip install -e ".[train]"

# Flash Attention 2: This is a specialized kernel that makes Llama training 
# significantly faster and more memory-efficient on A10G hardware.
pip install flash-attn --no-build-isolation
```

---

## 📡 Step 3: Launching the Verifiable Reward Server
CommitGuard uses **RLVR**. In this setup, the model doesn't just "guess" if it's right; it submits an action to a server that calculates a reward based on hard evidence.

### Running in the Background
Since training takes hours, we run the server in the background using the `&` symbol.
```bash
# Start the server
python -m commitguard_env.server &

# Verify Health: This ensures the database and API are ready.
# If this fails, the trainer will hang indefinitely.
curl http://localhost:8000/health
# You should see: {"status":"healthy"}
```

---

## 🧠 Step 4: Executing the GRPO Training Run
GRPO is a "reinforcement learning" algorithm. It asks the model to generate 4 different answers for the same code diff, compares them to each other, and rewards the ones that follow the XML format and correctly identify the vulnerability.

### Hyperparameter Explanation
*   **`--steps 500`:** The model will see roughly 2,000 examples (4 generations x 500 steps).
*   **`4-bit Quantization`:** Automatically handled by the script. It "compresses" the model weights so they fit into the GPU's memory without losing accuracy.
*   **`LoRA r=8`:** "Low-Rank Adaptation." Instead of training 3 billion parameters, we only train about 5 million. This makes training stable and fast.
*   **`--live`:** Tells the script to fetch rewards from the server we started in Step 3.

```bash
# Login to WandB so your graphs show up online
export WANDB_API_KEY=[YOUR_WANDB_KEY]

python scripts/train_grpo.py \
    --model_name "meta-llama/Llama-3.2-3B-Instruct" \
    --output_dir "./outputs/commitguard-final" \
    --steps 500 \
    --live \
    --wandb "commitguard-rlvr"
```

---

## 💾 Step 5: Post-Run Weight Management & Cleanup
Once the 500 steps are complete, the "brain" of your agent exists as a LoRA adapter in the `./outputs` folder.

### A. Permanent Storage (Hugging Face)
The VM's disk is temporary. Move your weights to Hugging Face immediately.
```bash
huggingface-cli login --token [YOUR_HF_TOKEN]
huggingface-cli upload [HF_USERNAME]/commitguard-llama3b-adapter ./outputs/commitguard-final
```

### B. Cost Control: Deleting the VM
**DO NOT FORGET THIS STEP.** An idle A10G instance costs money every hour.
```bash
# Exit the VM
exit

# Delete from your local terminal
gcloud compute instances delete commitguard-trainer --zone=us-central1-a
```

---

## 🆘 Critical Troubleshooting

### "CUDA Out of Memory"
*   **Symptom:** Training crashes with a long error ending in `OutOfMemoryError`.
*   **Fix:** The "Group" in GRPO is currently set to 4 generations. Open `scripts/train_grpo.py` and change `num_generations=4` to `num_generations=2`. This cuts memory usage in half.

### "Connection Refused"
*   **Symptom:** Reward function returns -1.0 for everything or throws errors.
*   **Fix:** Your environment server crashed or wasn't started. Run `ps aux | grep server` to check if it is still running.

### The "Midnight Fallback"
If the 3B model is too slow for the submission deadline:
*   Switch to the **1.5B Qwen** model. It uses the same XML format but is 2x faster.
*   Command: `python scripts/train_grpo.py --model_name "Qwen/Qwen2.5-1.5B-Instruct" ...`

---

## ✅ Final Success Checklist
1. [ ] **Health Check:** `curl` returns healthy.
2. [ ] **WandB Tracking:** You can see the `reward` curve moving on the website.
3. [ ] **Checkpoints:** You see folders like `checkpoint-50`, `checkpoint-100` in the output directory.
4. [ ] **Clean Exit:** The VM is deleted after the adapter is uploaded to Hugging Face.